Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

possible to construct java String from invalid code points?

Is it possible to construct a String in java from invalid code points?

Is there any way a String str.getBytes("utf8") in java can return an invalid utf8 encoding?

The context is that I want to be able to serialize a String using an utf8 encoding as an array of bytes, and want to be able to deserialize it into as the same String.

I want to determine whether or not my (de)serialization code should first check if the array of bytes is a valid utf8 encoding or not.

Thank you.

like image 413
morfys Avatar asked Mar 25 '26 00:03

morfys


1 Answers

You can use the CharsetEncoder and CharsetDecoder classes in java.nio.charset to achieve precise control over how characters and bytes are translated back and forth. In particular, CharsetDecoder.onMalformedInput() and CharsetDecoder.onUnmappableCharacter() let you define how those conditions should be handled. (The behaviour of the String constructor that takes a byte[] is undefined in these cases.)

like image 158
Matt McHenry Avatar answered Mar 26 '26 14:03

Matt McHenry



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!