Is it possible to construct a String in java from invalid code points?
Is there any way a String str.getBytes("utf8") in java can return an invalid utf8 encoding?
The context is that I want to be able to serialize a String using an utf8 encoding as an array of bytes, and want to be able to deserialize it into as the same String.
I want to determine whether or not my (de)serialization code should first check if the array of bytes is a valid utf8 encoding or not.
Thank you.
You can use the CharsetEncoder and CharsetDecoder classes in java.nio.charset to achieve precise control over how characters and bytes are translated back and forth. In particular, CharsetDecoder.onMalformedInput() and CharsetDecoder.onUnmappableCharacter() let you define how those conditions should be handled. (The behaviour of the String constructor that takes a byte[] is undefined in these cases.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With