Why is the length function saying that this 8 character string is 9 characters?
>>> length "Níðhöggr"
9
"Níðhöggr" contains 9 Unicode characters:
U+004E N (Lu): LATIN CAPITAL LETTER N 
U+00ED í (Ll): LATIN SMALL LETTER I WITH ACUTE
U+00F0 ð (Ll): LATIN SMALL LETTER ETH 
U+0068 h (Ll): LATIN SMALL LETTER H 
U+006F o (Ll): LATIN SMALL LETTER O 
U+0308 ̈ (Mn): COMBINING DIAERESIS 
U+0067 g (Ll): LATIN SMALL LETTER G 
U+0067 g (Ll): LATIN SMALL LETTER G 
U+0072 r (Ll): LATIN SMALL LETTER R 
You might want to use "Níðhöggr", which looks the same when printed out, but contains U+00F6 LATIN SMALL LETTER O WITH DIAERESIS instead of the two-character ö combo. In other words, it is in the composed normal form (NFC).
Or you might want "Níðhöggr", which has 10 Unicode characters (the í is split int i and a combining accent). That would be decomposed normal form (NFD).
Google "Unicode normalization" for interesting and/or hairy details. Use this function to normalize Unicode data in Haskell (thanks, Adam Rosenfield!).
Because your ö isn't the single character ö (U+00F6 LATIN SMALL LETTER O WITH DIAERESIS); it's U+006F LATIN SMALL LETTER O plus U+0308 COMBINING DIAERESIS.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With