Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the default unicode character encoding used in Windows?

What is the default unicode character encoding used in Windows? Specifically in Windows Programming (Win32 and WinRT). When I programmed in WinAPI, "char" maps to a 1 byte character storage and "wchar_t" maps to a 2 byte character storage. If UTF-16 encodes all the characters beyond 65536 in 4 bytes then how do Windows map these characters in a "wchar_t" data type? I know that my question is not clear enough but I hope you understand some of my concerns. Thank you very much!

like image 576
wembikon Avatar asked Sep 11 '25 21:09

wembikon


1 Answers

Windows uses UTF-16LE for all things Unicode (except for MultiByteToWideChar() and WideCharToMultiByte(), which support UTF-7, UTF-8, and UTF-16, amongst other charsets installed in the OS). UTF-16 uses surrogate pairs (2 16bit values working together) to encode Unicode values above 0xFFFF. For example, Unicode codepoint U+1D11E is encoded as 0xD834 0xDD1E (bytes 0x34 0xD8 0x1E 0xDD) in UTF-16LE.

like image 136
Remy Lebeau Avatar answered Sep 13 '25 11:09

Remy Lebeau