Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I convert UTF-8 in hex to its code point?

Tags:

java

utf-8

I have a String e2 80 99 which is a Hex representation of a UTF-8 character. The string represents

U+2019  ’   e2 80 99    RIGHT SINGLE QUOTATION MARK

I want to convert e2 80 99 to its corresponding Unicode code point which is U+2019 or even ' (single quotation).

How do I do it?

like image 597
shashank Avatar asked Oct 25 '25 05:10

shashank


1 Answers

Basically you need to get a String representation of the character encoded with utf-8, then get the first character of the resulting String (or first + second if the resulting character is represented as two surrogates in UTF-16). This is a proof of concept:

public static void main(String[] args) throws Exception {

    // Convert your representation of a char into a String object: 
    String utf8char = "e2 80 99";
    String[] strNumbers = utf8char.split(" ");
    byte[] rawChars = new byte[strNumbers.length];
    int index = 0;
    for(String strNumber: strNumbers) {
        rawChars[index++] = (byte)(int)Integer.valueOf(strNumber, 16);
    }
    String utf16Char = new String(rawChars, Charset.forName("UTF-8"));

    // get the resulting characters (Java Strings are "encoded" in UTF16)
    int codePoint = utf16Char.charAt(0);
    if(Character.isSurrogate(utf16Char.charAt(0))) {
        codePoint = Character.toCodePoint(utf16Char.charAt(0), utf16Char.charAt(1));
    }
    System.out.println("code point: " + Integer.toHexString(codePoint));
}
like image 179
morgano Avatar answered Oct 26 '25 18:10

morgano



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!