Are these obsolete? They seem like the worst idea ever -- embed something in the contents of your file that no one can see, but impacts the file's functionality. I don't understand why I would want one.
The byte order mark (BOM) is a piece of information used to signify that a text file employs Unicode encoding, while also communicating the text stream's endianness. The BOM is not interpreted as a logical part of the text stream itself, but is rather an invisible indicator at its head.
The UTF-8 file signature (commonly also called a "BOM") identifies the encoding format rather than the byte order of the document. UTF-8 is a linear sequence of bytes and not sequence of 2-byte or 4-byte units where the byte order is important. Encoding. Encoded BOM.
Unicode can be stored using several different encodings, which translate the character codes into sequences of bytes. The Unicode standard defines three and several other encodings exist, all in practice variable-length encodings.
The "BOM" is a holdover from the early days of Unicode when it was assumed that using Unicode would mean using 16-bit characters. It is completely pointless in an encoding like UTF-8 which has only one byte order. The choice of U+FEFF is also suboptimal for UTF-32, because it cannot distinguish between all possible middle-endian byte orders (to do so would require a BOM encoded with 4 different bytes).
The only reason you'd use one is when sending UTF-16 or UTF-32 data between platforms with different byte orders, but (1) most people use UTF-8 anyway, and (2) the MIME charset parameter provides a better mechanism.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With