I was wondering if there's any difference between converting characters to byte with Encoding.UTF8.GetBytes
or manually using (byte)
before characters and convert them to byte?
For an example, look at following code:
public static byte[] ConvertStringToByteArray(string str)
{
int i, n;
n = str.Length;
byte[] x = new byte[n];
for (i = 0; i < n; i++)
{
x[i] = (byte)str[i];
}
return x;
}
var arrBytes = ConvertStringToByteArray("Hello world");
or
var arrBytes = Encoding.UTF8.GetBytes("Hello world");
I liked the question so I executed your code on an ANSI text in Hebrew I read from a text file.
The text was "שועל"
string text = System.IO.File.ReadAllText(@"d:\test.txt");
var arrBytes = ConvertStringToByteArray(text);
var arrBytes1 = Encoding.UTF8.GetBytes(text);
The results were
As you can see there is a difference when the code point of any of your characters exceeds the 0-255 range of byte
.
Your ConvertStringToByteArray
method is incorrect.
you are casting each char
to byte. char
's numerical value is its Unicode code point which could be larger than a byte, so the casting will often result in an arithmetic overflow.
Your example works because you've used characters with code points within the byte
range.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With