I know I can get the codepoint of a character using the ?a
syntax.
iex> ?a
97
But what about when a
is a binary, "a"
? How can I get the codepoint in this case?
Beware of UTF-8 decomposed form. It’s always safer to call String.normalize/2
on input before further processing (passing :nfc
as a second argument.)
One might expect
<<cp::utf8>> = "á"
to work, but it raises, while
<<cp::utf8>> = "á"
works pretty fine. There is no typo above, "á"
in the first example and "á"
in the second example are different.
"á" == "á"
#⇒ false
To safely match both composed and decomposed, no matter what, one might explicitly normalize it to composed form upfront.
with <<cp::utf8>> <- String.normalize("á", :nfc),
do: cp
#⇒ 225
All the examples above are copy-pasteable.
"á"
|> String.normalize(:nfc)
|> String.to_charlist()
|> hd()
#⇒ 225
but
"á"
|> String.to_charlist()
|> hd()
#⇒ 97
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With