Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between using Encoding.GetBytes or cast to byte [duplicate]

Tags:

c#

encoding

byte

I was wondering if there's any difference between converting characters to byte with Encoding.UTF8.GetBytes or manually using (byte) before characters and convert them to byte?

For an example, look at following code:

public static byte[] ConvertStringToByteArray(string str)
{
    int i, n;
    n = str.Length;
    byte[] x = new byte[n];
    for (i = 0; i < n; i++)
    {
        x[i] = (byte)str[i];
    }
    return x;
}

var arrBytes = ConvertStringToByteArray("Hello world");

or

var arrBytes = Encoding.UTF8.GetBytes("Hello world");
like image 268
Afshin Mehrabani Avatar asked Sep 13 '25 06:09

Afshin Mehrabani


2 Answers

I liked the question so I executed your code on an ANSI text in Hebrew I read from a text file.

The text was "שועל"

string text = System.IO.File.ReadAllText(@"d:\test.txt");
var arrBytes = ConvertStringToByteArray(text);
var arrBytes1 = Encoding.UTF8.GetBytes(text);

The results were

This is what I got in the watch

As you can see there is a difference when the code point of any of your characters exceeds the 0-255 range of byte.

like image 158
asafrob Avatar answered Sep 15 '25 20:09

asafrob


Your ConvertStringToByteArray method is incorrect. you are casting each char to byte. char's numerical value is its Unicode code point which could be larger than a byte, so the casting will often result in an arithmetic overflow.

Your example works because you've used characters with code points within the byte range.

like image 41
argaz Avatar answered Sep 15 '25 21:09

argaz