Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is CP437 decoding broken for control characters?

According to the Wikipedia page for Code Page 437 the byte values \x01 through \x1f should decode to graphic characters, e.g. b'\x01' equates to ☺ '\u263A'. But that's not what decode produces:

>>> b'\x01'.decode('cp437')
'\x01'

That was Python 3.6 but 2.7 does the same, for all 31 byte values.

like image 317
Mark Ransom Avatar asked Oct 18 '25 09:10

Mark Ransom


2 Answers

While there were graphics associated with the byte range \x01 through \x1f, those graphics were only used in some contexts. In other contexts, those code points would be interpreted as control characters, as in ASCII. Quoting an IBM page on CP437:

Code points X'01' through X'1F' and X'7F' may be controls or graphics depending on context. For displays the hexadecimal code in a memory-mapped video display buffer is a graphic. For printers the graphics context is established by a preceding control sequence in the data stream. There are two such control sequences: ESC X'5C' and ESC X'5E' named Print All Characters and Print Single Character respectively. In other situations the code points in question are used as controls.

Python's CP437 decoding is based on the Unicode mappings on Unicode.org, which use the control character interpretation.

The Unicode FAQ implies that "The correct Unicode mappings for the special graphic characters (01-1F, 7F) of CP437 and other DOS-type code pages" should be available at https://www.unicode.org/Public/MAPPINGS, but digging down there only turns up the mappings with the control characters, and a page linking to several IBM websites. Digging through IBM's sites turns up ftp://ftp.software.ibm.com/software/globalization/gcoc/attachments/CP00437.txt, which gives graphical mappings for \x01-\x1f in terms of IBM's GCGID system, but not in terms of Unicode.

I don't know if there actually is an official mapping, from either IBM or Unicode, that gives canonical Unicode mappings for \x01-\x1f in terms of the graphical interpretation of CP437.

like image 179
user2357112 supports Monica Avatar answered Oct 20 '25 23:10

user2357112 supports Monica


I managed to find this file in there:
https://unicode.org/Public/MAPPINGS/VENDORS/MISC/IBMGRAPH.TXT

It includes a mapping of the Unicode characters (0x01-0x1f) to IBM CP437, as well as IBM CP864 (Arabic).

like image 33
Tynach Avatar answered Oct 21 '25 00:10

Tynach



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!