I'm reading csv files that contain special characters such as the long en dash –, left double quotes “, and right double quotes ” and I can't figure out the proper way to read and write these correctly. I thought it was UTF8 or Unicode but it reads and writes them as a square or ? with a diamond. Opening the files in notepad++ to confirm. Maybe another specific encoding is needed? Here is the code I've been using so far, tried a few variations of this with different encoding.:
string[] lines = File.ReadAllLines(filePathTxt.Text, Encoding.UTF8);
...
Stream s = new FileStream(filePath, FileMode.Append);
StreamWriter sw = new StreamWriter(s, Encoding.UTF8, 1000, true);
Input of:
Surveys – Public
Documents:,“A”
comes out as
Surveys � Public
Documents:,�A�
Also shows problems in debugger as soon as it's read into the string array.
Edit: I've tried Unicode also. I'm using NotePad++, Win 10. The problem is definitely in the Read step, because if I add the following line to manually write a line of data, like so:
sw.WriteLine("Surveys – Public");
That line writes the dash fine, so it's on the initial read of the file from the source csv where the characters get messed up. I've tried reading with a few different encodings, and NotePad++ just shows the csv as being ANSI.
Instead of:
StreamWriter sw = new StreamWriter(s, Encoding.UTF8, 1000, true);
use this:
StreamWriter sw = new StreamWriter(s, Encoding.Unicode, 1000, true);
I just tried it and it shows up correctly in NotePad++
Here's the sample I ran that I used for testing it:
using (StreamWriter swClifor = new StreamWriter("test.txt", true, Encoding.Unicode))
{
string cString = "en dash –, left double quotes “, and right double quotes ”";
swClifor.WriteLine(cString);
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With