Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to set a text file to UTF-16?

My code for writing text works for ANSI characters, but when I try to write Japanese characters they do not appear. Do I need to use UTF-16 encoding? If so, how would I do it on code?

std::wstring filename;
std::wstring text;
filename = "path";
wofstream myfile;
myfile.open(filename, ios::app);
getline(wcin, text);
myfile << text << endl;
wcin.get();
myfile.close();
like image 285
Guilherme Galdino Avatar asked Oct 24 '25 10:10

Guilherme Galdino


2 Answers

From the comments it seems your console correctly understands Unicode, and the issue is only with file output.

Here's how to write a text file in UTF-16LE. Just tested in MSVC 2019 and it works.

#include <string>
#include <fstream>
#include <iostream>
#include <codecvt>
#include <locale>

int main() {
    std::wstring text = L"test тест 試験.";
    std::wofstream myfile("test.txt", std::ios::binary);
    std::locale loc(std::locale::classic(), new std::codecvt_utf16<wchar_t, 0x10ffff, std::little_endian>);
    myfile.imbue(loc);
    myfile << wchar_t(0xFEFF) /* UCS2-LE BOM */;
    myfile << text << "\n";
    myfile.close();
}

You must use std::ios::binary mode for output under Windows, otherwise \n will break it by expanding to \r\n, ending up emitting 3 bytes instead of 2.

You don't have to write the BOM at the beginning, but having one greatly simplifies opening the file using the correct encoding in text editors.

Unfortunately, std::codecvt_utf16 is deprecated since C++17 with no replacement (yes, Unicode support in C++ is that bad).

like image 162
rustyx Avatar answered Oct 25 '25 23:10

rustyx


Expanding my answer to your last question, here's a C library solution for writing the file. I saved the source as UTF-8 and compiled with Microsoft "cl /EHsc /W4 /utf-8 test.cpp".

#include <fcntl.h>
#include <io.h>
#include <string>
#include <iostream>

// From fctrl.h:
//  #define _O_U16TEXT     0x20000 // file mode is UTF16 no BOM (translated)
//  #define _O_WTEXT       0x10000 // file mode is UTF16 (translated)

using namespace std;

int main()
{
    // Declare console I/O that works with Unicode.
    _setmode(_fileno(stdout),  _O_WTEXT);  // or _O_U16TEXT, either work
    _setmode(_fileno(stdin), _O_WTEXT);

    // Send a string to the console to verify stdout works with wide strings.
    wstring s = L"こんにちは, 世界!\nHello, World!";
    wcout << s << endl;

    // Read an input string.  I used an IME to enter Chinese.
    // Verify the stdin works...
    wstring test;
    getline(wcin, test);

    // Write it back out to stdout...
    wcout << test << endl;

    // Write it to a file as UTF-16.
    FILE *dest = fopen("out.txt", "w, ccs=UTF-16LE");
    fwprintf(dest, L"%s", test.c_str());
    return 0;
}

Output (console):

C:\>test
こんにちは, 世界!
Hello, World!
你好,马克!
你好,马克!

C:\>type out.txt
你好,马克!

Hex dump of the file content showing UTF-16LE w/ BOM encoding:

ff fe 60 4f 7d 59 0c ff 6c 9a 4b 51 01 ff
like image 23
Mark Tolonen Avatar answered Oct 25 '25 23:10

Mark Tolonen