Goal:
Upload / post CSV file w/ UTF-8 characters to an MVC action, read the the data and stick it in a database table.
Problem:
Only the plain text characters make it through. UTF-8 "special" characters like á are not coming through correctly, in code and in the database they render as this character => �.
More:
I'm convinced that this isn't a problem with my C# code although I've included the important parts below.
I thought the problem was that the uploaded file is encoded a plain text or "plain/text" MIME type, but I was able to change that by changing the file extension to .html
Summary:
How do you get a form with an enctype attribute set to "multipart/form-data" to correctly interpret UTF-8 characters in a posted file?
Research:
From my research this appears to be a common problem without a common and clear solution.
I've found more solutions for java and PHP than .Net as well.
csvFile variable is of type HttpPostedFileBase
this is the MVC action signature
[HttpPost]
public ActionResult LoadFromCsv(HttpPostedFileBase csvFile)
Things I've tried:
1)
using (Stream inputStream = csvFile.InputStream)
{
    byte[] bytes = ReadFully(inputStream);
    string bytesConverted = new UTF8Encoding().GetString(bytes);
}
2)
using (Stream inputStream = csvFile.InputStream)
{
    using (StreamReader readStream = new StreamReader(inputStream, Encoding.UTF8, true))
    {
        while (!readStream.EndOfStream)
        {
            string csvLine = readStream.ReadLine();
            // string csvLine = new UTF8Encoding().GetString(new UTF8Encoding().GetBytes(readStream.ReadLine())); // stupid... this can not be the way!
        }
    }
}
3)
<form method="post" enctype="multipart/form-data" accept-charset="UTF-8">
4)
<input type="file" id="csvFile" name="csvFile" accept="UTF-8" />
<input type="file" id="csvFile" name="csvFile" accept="text/html" />
5)
When the file has a .txt extension, the ContentType property of the HttpPostedFileBase is "text/plain"
When I change the file extension from .txt to .csv the ContentType property of the HttpPostedFileBase is "application/vnd.ms-excel"
When I change the file extension to .html, the ContentType property of the HttpPostedFileBase is "text/html" - I thought this was going to be a winner, but it wasn't.
In my soul I have to believe there is an easy solution to this problem. It surprises me that I haven't been able to figure this one out on my own, uploading UTF-8 characters in a file is a common task! Why am I failing here?!?!
Perhaps I have to adjust mime types in IIS for the website?
Perhaps I need different DOCTYPE / html tag / meta tags?
@Gabe -
Here is what my post looks like in fiddler. This is really interesting because the � is plain as day, right there in the post value.
http://localhost/AwesomeGeography/GeoBytesCities/LoadFromCsv?adsf HTTP/1.1
Host: localhost
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Referer: http://localhost/AwesomeGeography/GeoBytesCities/LoadFromCsv?adsf
Content-Type: multipart/form-data; boundary=---------------------------199122566726299
Content-Length: 354
-----------------------------199122566726299
Content-Disposition: form-data; name="csvFile"; filename="cities_test.html"
Content-Type: text/html
"CityId","CountryID","RegionID","City","Latitude","Longitude","TimeZone","DmaId","Code"
3344,10,1063,"Luj�n de Cuyo","-33.05","-68.867","-03:00",0,"LDCU"
-----------------------------199122566726299--
I have the same problem, you can use
StreamReader reader = new StreamReader(archivo_origen.InputStream, Encoding.GetEncoding("iso-8859-1"));
and it work, "iso-8859-1" is for latin derived language like, spanish, aleman, frances
Based on the information given, I would guess that the problem is with the file encoding itself - not with your code.
I ran a simple test to demonstrate this:
I exported a simple csv file from Excel containing special characters.
Then, I uploaded it through the following form and action method.
Form
<form method="post" action="@Url.Action("UploadFile", "Home")" enctype="multipart/form-data">
    <input type="file" id="file" name="file" />
    <input type="submit" />
</form>
Action method
[HttpPost]
public ActionResult UploadFile(HttpPostedFileBase file)
{
    using (StreamReader reader = new StreamReader(file.InputStream, System.Text.Encoding.UTF8))
    {
        string text = reader.ReadToEnd();
    }
    return RedirectToAction("Index");
}
I had the same problem as you in this case - the special characters were replaced with �.
I opened the file in Notepad and the special characters were displayed correctly there, so it seemed that it couldn't be a file problem, but when I opened the "Save As" dialog, the selected encoding was "ANSI". I switched it to UTF-8 and saved it, ran it through the uploader, and it all worked fine.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With