Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Handling non english characters in C#

I need to get my understanding of character sets and encoding right. Can someone point me to good write up on handling different character sets in C#?

Here's one of the problems I'm facing -

        using (StreamReader reader = new StreamReader("input.txt"))
        using (StreamWriter writer = new StreamWriter("output.txt")
        {
            while (!reader.EndOfStream)
            {
                writer.WriteLine(reader.ReadLine());
            }
        }

This simple code snippet does not always preserve the encoding -

For example -

Aukéna in the input is turned into Auk�na in the output.

like image 770
Colonel Panic Avatar asked Dec 18 '25 13:12

Colonel Panic


2 Answers

You just have an encoding problem. You have to remember that all you're really reading is a stream of bits. You have to tell your program how to properly interpret those bits.

To fix your problem, just use the constructors that take an encoding as well, and set it to whatever encoding your text uses.

http://msdn.microsoft.com/en-us/library/ms143456.aspx

http://msdn.microsoft.com/en-us/library/3aadshsx.aspx

like image 154
Esteban Araya Avatar answered Dec 21 '25 02:12

Esteban Araya


I guess when reading a file, you should know which encoding the file has. Otherwise you can easily fail to read it correctly.

When you know the encoding of a file, you may do the following:

        using (StreamReader reader = new StreamReader("input.txt", Encoding.GetEncoding(1251)))
        using (StreamWriter writer = new StreamWriter("output.txt", false, Encoding.GetEncoding(1251)))
        {
            while (!reader.EndOfStream)
            {
                writer.WriteLine(reader.ReadLine());
            }
        }

Another question comes up, if you want to change the original encoding of a file.

The following article may give you a good basis of what encodings are: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

And this is a link msdn article, from which you could start: Encoding Class

like image 39
horgh Avatar answered Dec 21 '25 04:12

horgh



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!