Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using charCodeAt() and fromCharCode to obtain Unicode characters (code value > 55349) with JS

Tags:

javascript

I use "".charCodeAt(pos) to get the Unicode number for a strange character, and then String.fromCharCode for the reverse.

But I'm having problems with characters that have a Unicode number greater than 55349. For example, the Blackboard Bold characters. If I want Lowercase Blackboard Bold X (𝕩), which has a Unicode number of 120169, if I alert the code from JavaScript:

alert(String.fromCharCode(120169));

I get another character. The same thing happens if I log an Uppercase Blackboard Bold X (𝕏), which has a Unicode number of 120143, from directly within JavaScript:

s="𝕏";
alert(s.charCodeAt(0))
alert(s.charCodeAt(1))

Output:

55349
56655

Is there a method to work with these kind of characters?

like image 847
gialloporpora Avatar asked Sep 19 '25 09:09

gialloporpora


1 Answers

Internally, Javascript stores strings in a 16-bit encoding resembling UCS2 and UTF-16. (I say resembling, since it’s really neither of those two). The fact that they’re 16-bits means that characters outside the BMP, with code points above 65535, will be split up into two different characters. If you store the two different characters separately, and recombine them later, you should get the original character without problem.

Recognizing that you have such a character can be rather tricky, though.

Mathias Bynens has written a blog post about this: JavaScript’s internal character encoding: UCS-2 or UTF-16?. It’s very interesting (though a bit arcane at times), and concludes with several references to code libraries that support the conversion from UCS-2 to UTF-16 and vice versa. You might be able to find what you need in there.

like image 56
Martijn Avatar answered Sep 22 '25 02:09

Martijn