I'm using the following code to format numbers using the proper locale. When using French, numbers have "non-breaking space" between groups of digits. The string I'm getting seems to be invalid.
    std::stringstream ss;
    ss.imbue(std::locale("fr_FR.UTF-8"));
    ss << 1234;
    auto result = ss.str();
here, result is: {49, -62, 50, 51, 52}. The non-breaking space is represented with -62. It seems to me that it's invalid UTF-8, right?
I expect result to be: {49, -62, -96, 50, 51, 52} (in this case, this seems valid, with the non-breaking space represented with two chars: -62, -96).
Am I missing something? Thanks for your help.
The problem is that std::locale doesn't support multi-byte digit separators because std::numpunct::thousands_sep only returns a single code unit (char in this case). As a result, in your case, the digit separator NO-BREAK SPACE 0xC2 (-62) 0xA0 (-96) gets truncated and you only see the first code unit 0xC2 (-62) which is an invalid partial UTF-8.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With