According to the Wikipedia page for Code Page 437 the byte values \x01
through \x1f
should decode to graphic characters, e.g. b'\x01'
equates to ☺ '\u263A'
. But that's not what decode
produces:
>>> b'\x01'.decode('cp437')
'\x01'
That was Python 3.6 but 2.7 does the same, for all 31 byte values.
While there were graphics associated with the byte range \x01
through \x1f
, those graphics were only used in some contexts. In other contexts, those code points would be interpreted as control characters, as in ASCII. Quoting an IBM page on CP437:
Code points X'01' through X'1F' and X'7F' may be controls or graphics depending on context. For displays the hexadecimal code in a memory-mapped video display buffer is a graphic. For printers the graphics context is established by a preceding control sequence in the data stream. There are two such control sequences: ESC X'5C' and ESC X'5E' named Print All Characters and Print Single Character respectively. In other situations the code points in question are used as controls.
Python's CP437 decoding is based on the Unicode mappings on Unicode.org, which use the control character interpretation.
The Unicode FAQ implies that "The correct Unicode mappings for the special graphic characters (01-1F, 7F) of CP437 and other DOS-type code pages" should be available at https://www.unicode.org/Public/MAPPINGS, but digging down there only turns up the mappings with the control characters, and a page linking to several IBM websites. Digging through IBM's sites turns up ftp://ftp.software.ibm.com/software/globalization/gcoc/attachments/CP00437.txt, which gives graphical mappings for \x01
-\x1f
in terms of IBM's GCGID system, but not in terms of Unicode.
I don't know if there actually is an official mapping, from either IBM or Unicode, that gives canonical Unicode mappings for \x01
-\x1f
in terms of the graphical interpretation of CP437.
I managed to find this file in there:
https://unicode.org/Public/MAPPINGS/VENDORS/MISC/IBMGRAPH.TXT
It includes a mapping of the Unicode characters (0x01-0x1f) to IBM CP437, as well as IBM CP864 (Arabic).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With