Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Called ReadFile on a text file, got weird (Japanese?) characters

Tags:

c++

c

io

I use the next code to read all of the elemnts from a file with the handle hFile that works, and with its size that I got with GetFileSize(hFile, NULL).

_TCHAR* text = (_TCHAR*)malloc(sizeOfFile * sizeof(_TCHAR));
DWORD numRead = 0;
BOOL didntFail = ReadFile(hFile, text, sizeOfFile, &numRead, NULL);

after the operation text is some strange thing in Japanese or something, and not the content of the file.

what did i do wrong?

edit: I understand it is the encoding problem, but then how will I convert text to LPCWSTR to use stuff like WriteConsoleOutputCharacter

like image 679
The GiG Avatar asked Dec 14 '25 13:12

The GiG


1 Answers

Modern IDEs default to Unicode applications, meaning _TCHAR is actually wchar_t. ReadFile() works with simple bytes and if you use it to fill a _TCHAR array directly, you'll get 8-bit characters interpreted as UTF-16 Unicode. These usually show as CJK (Chinese/Japanese/Korean) glyphs.

You have three options:

  • convert your program to non-Unicode
  • use a file containing Unicode text (in UTF-16 encoding), or
  • read from the file into a char array and then use MultiByteToWideChar() to convert the text to Unicode.

If you mix Unicode and non-Unicode be careful to calculate the correct buffer sizes (number of bytes vs. number of characters).

Note that you can still use narrow chars with Windows in your Unicode program if you call the ANSI version of the Windows function (e.g. WriteConsoleOutputCharacterA).

like image 137
efotinis Avatar answered Dec 17 '25 01:12

efotinis



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!