Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java String to byteArray conversion issue

I am trying to encode/decode a ByteArray to String, but input/output are not matching. Am I doing something wrong?

System.out.println(org.apache.commons.codec.binary.Hex.encodeHexString(by));
String s = new String(by, Charsets.UTF_8);
System.out.println(org.apache.commons.codec.binary.Hex.encodeHexString(s.getBytes(Charsets.UTF_8)));

The output is:

130021000061f8f0001a
130021000061efbfbd

Complete code:

String[] arr = {"13", "00", "21", "00", "00", "61", "F8", "F0", "00", "1A"};        
byte[] by = new byte[arr.length];

for (int i = 0; i < arr.length; i++) {
    by[i] = (byte)(Integer.parseInt(arr[i],16) & 0xff); 
}

System.out.println(org.apache.commons.codec.binary.Hex.encodeHexString(by));

String s = new String(by, Charsets.UTF_8);
System.out.println(org.apache.commons.codec.binary.Hex.encodeHexString(s.getBytes(Charsets.UTF_8)));
like image 670
banjara Avatar asked May 17 '26 06:05

banjara


1 Answers

The problem here is that f8f0001a isn't a valid UTF-8 byte sequence.

First of all, the f8 opening byte denotes a 5 byte sequence and you've only got four. Secondly, f8 can only be followed by a byte of 8x, 9x, ax or bx form.

Therefore it gets replaced with a unicode replacement character (U+FFFD), whose byte sequence in UTF-8 is efbfbd.

And there (rightly) is no guarantee that the conversion of an invalid byte sequence to and from a string will result in the same byte sequence. (Note that even with two, seemingly identical strings, you might get different bytes representing them in Unicode, see Unicode equivalence. )

The moral of the story is: if you want to represent bytes, don't convert them to characters, and if you want to represent text, don't use byte arrays.

like image 100
biziclop Avatar answered May 18 '26 20:05

biziclop



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!