In Python 2.7 at least, unicodedata.name() doesn't recognise certain characters.
>>> from unicodedata import name
>>> name(u'\n')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: no such name
>>> name(u'a')
'LATIN SMALL LETTER A'
Certainly Unicode contains the character \n, and it has a name, specifically "LINE FEED".
NB. unicodedata.lookup('LINE FEED') and unicodedata.lookup(u'LINE FEED') both give a KeyError: undefined character name.
The unicodedata.name() lookup relies on column 2 of the UnicodeData.txt database in the standard (Python 2.7 uses Unicode 5.2.0).
If that name starts with < it is ignored. All control codes, including newlines, are in that category; the first column has no name other than <control>:
000A;<control>;Cc;0;B;;;;;N;LINE FEED (LF);;;;
Column 10 is the old, Unicode 1.0 name, and should not be used, according to the standard. In other words, \n has no name, other than the generic <control>, which the Python database ignores (as it is not unique).
Python 3.3 added support for NameAliases.txt, which lets you look up names by alias; so lookup('LINE FEED'), lookup('new line') or lookup('eol'), etc, all reference \n. However, the unicodedata.name() method does not support aliases, nor could it (which would it pick?):
- Added support for Unicode name aliases and named sequences. Both
unicodedata.lookup()and'\N{...}'now resolve name aliases, andunicodedata.lookup()resolves named sequences too.
TL;DR: LINE FEED is not the official name for \n, it is but an alias for it. Python 3.3 and up let you look up characters by alias.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With