How to avoid inadvertent encoding of UTF-8 files as ASCII/ANSI?

Question

In the process of editing a file encoded as UTF-8 w/o [spurious] BOM the content might become devoid of any Unicode characters outside the ASCII or ANSI ranges. At the next reopening of the file, some text editors (Notepad++) will interpret it as ASCII/ANSI encoded and open it as such. Unaware of the change the user will continue editing, now adding non-ANSI Unicode characters, rendered however useless, since saved in ANSI. A menu option can exist (Notepad++) to open ANSI files as UTF-8 w/o BOM, but leading to the reverse issue of inadvertently overriding ANSI files with Unicode encoding.

Vlad Atanasiu · Accepted Answer

One workaround is to add a character outside the ANSI range to a comment in the file. Depending on the decoding algorithm, it might force the editor (Notepad++) to recognize the file as encoded in UTF-8 w/o BOM.

In a HTML document for example you could follow the charset definition in the header with such a Unicode comment, here the U+05D0 HEBREW LETTER ALEF: <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

How to avoid inadvertent encoding of UTF-8 files as ASCII/ANSI?

Tags:

notepad++

utf-8

byte-order-mark

Vlad Atanasiu

1 Answers

Vlad Atanasiu

Recent Activity

Donate For Us

How to avoid inadvertent encoding of UTF-8 files as ASCII/ANSI?

Tags:

notepad++

utf-8

byte-order-mark

Vlad Atanasiu

1 Answers

Vlad Atanasiu

Related questions

Recent Activity

Donate For Us