I use "".charCodeAt(pos)
to get the Unicode number for a strange character, and then String.fromCharCode
for the reverse.
But I'm having problems with characters that have a Unicode number greater than 55349. For example, the Blackboard Bold characters. If I want Lowercase Blackboard Bold X (𝕩
), which has a Unicode number of 120169, if I alert the code from JavaScript:
alert(String.fromCharCode(120169));
I get another character. The same thing happens if I log an Uppercase Blackboard Bold X (𝕏
), which has a Unicode number of 120143, from directly within JavaScript:
s="𝕏";
alert(s.charCodeAt(0))
alert(s.charCodeAt(1))
Output:
55349
56655
Is there a method to work with these kind of characters?
Internally, Javascript stores strings in a 16-bit encoding resembling UCS2 and UTF-16. (I say resembling, since it’s really neither of those two). The fact that they’re 16-bits means that characters outside the BMP, with code points above 65535, will be split up into two different characters. If you store the two different characters separately, and recombine them later, you should get the original character without problem.
Recognizing that you have such a character can be rather tricky, though.
Mathias Bynens has written a blog post about this: JavaScript’s internal character encoding: UCS-2 or UTF-16?. It’s very interesting (though a bit arcane at times), and concludes with several references to code libraries that support the conversion from UCS-2 to UTF-16 and vice versa. You might be able to find what you need in there.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With