$ echo $LANG
en_US.UTF-8
$ echo 你好 | iconv -f UTF8 -t UTF32BE | tee hello.txt
O`Y}
$ vim -N -u NONE --cmd 'set tenc=utf32 enc=utf32 fencs=utf32be' hello.txt
你好
~
~
~
:set tenc enc fenc
termencoding=ucs-4
encoding=ucs-4
fileencoding=ucs-4
The terminal cannot display UTF32 characters.
After modifying several encoding options of Vim.
Vim still can display UTF32 without any problems.
Why?
Interesting. You can run your command inside script to verify that Vim is actually writing UTF-8 to your terminal.
The help for 'charconvert' and 'encoding' give oblique hints as to the internal operation, but I did not find a corresponding hint that this same behavior is applied to termencoding. Respectively:
Vim internally uses UTF-8 instead of UCS-2 or UCS-4.
and
When "unicode", "ucs-2" or "ucs-4" is used, Vim internally uses utf-8.
So, we will use the source (version 7.3.548, specifically) to find out what is happening.
The value for the termencoding/tenc option is stored in the global variable p_tenc.
did_set_string_option() seems to handle the setting of string-valued options.
When handling termencoding, it calls convert_setup() to setup output_conv (for converting encoding to termencoding).
The comment for convert_setup gives the first hint as to what is happening:
Note: cannot be used for conversion from/to ucs-2 and ucs-4 (will use utf-8 instead).
convert_setup calls convert_setup_ext() with TRUE for both of the {from,to}_unicode_is_utf8 parameters.
from,to}_unicode_is_utf8 are true (they are), it sets the local variables {from,to}_is_utf8 based on whether the specified encodings have the ENC_UNICODE property (ucs-4 does, as do all of Vim’s utf-… and ucs-… encodings).iconv, Vim substitutes utf-8 if {from,to}_is_utf8 are true (in this case, they are).Ultimately, the values of encoding and termencoding are handled in the same way here. utf-32 is mapped to ucs-4, which has ENC_UNICODE, and Vim substitutes the desired encoding with UTF-8. Maybe there are some hints in the commit logs that indicate why termencoding is treated this way; I will leave that archeology to someone else, though.
The code path for handling fileencoding is different. It only forces UTF-8 for the “internal side” of the conversion (and only if a “Unicode” encoding is in effect).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With