I have tried searching stackoverflow to find an answer to this but the questions and answers I've found are around 10 years old and I can't seem to find consensus on the subject due to changes and possible progress.
There are several libraries that I know of outside of the stl that are supposed to handle unicode-
There are a few features of the stl (wstring,codecvt_utf8) that were included but people seem to be ambivalent about using because they deal with UTF-16 which this site: (utf-8 everywhere) says shouldn't be used and many people online seem agree with the premise.
The only thing I'm looking for is the ability to do 4 things with a unicode strings-
From what I can tell icu handles this and more. What I would like to know is if there is a standard way of handling this on Linux, Windows, and MacOS.
Thank you for your time.
It can represent all 1,114,112 Unicode characters. Most C code that deals with strings on a byte-by-byte basis still works, since UTF-8 is fully compatible with 7-bit ASCII. Characters usually require fewer than four bytes. String sort order is preserved.
Unicode Character “C” (U+0043)
Unicode is a globally used standard for character encoding. It is specifically used to assign some code to every character in every linguistic worldwide. There are many other encoding standards. Unfortunately, not a single encoding standard can be applied to all worldwide languages.
I will try to throw some ideas here:
most C++ programs/programmers just assume that a text is an almost opaque sequence of bytes. UTF-8 is probably guilty for that, and there is no surprise that many comments resume to: don't worry with Unicode, just process UTF-8 encoded strings
files only contains bytes. At a moment, if you try to internally process true Unicode code points, you will have to serialize that to bytes -> here again UTF-8 wins the point
as soon as you go out of the Basic Multilingual Plane (16 bits code points), things become more and more complex. The emoji is specifically awful to process: an emoji can be followed by a variation selector (U+FE0E VARIATION SELECTOR-15 (VS15) for text or U+FE0F VARIATION SELECTOR-16 (VS16) for emoji-style) to alter its display style, more or less the old i bs ^ that was used in 1970 ascii when one wanted to print î. That's not all, the characters  U+1F3FB to U+1F3FF are use to provide a skin color for 102 human emoji spread across six blocks: Dingbats, Emoticons, Miscellaneous Symbols, Miscellaneous Symbols and Pictographs, Supplemental Symbols and Pictographs, and Transport and Map Symbols.
That simply means that up to 3 consecutive unicode code points can represent one single glyph... So the idea that one character is one char32_t is still an approximation
My conclusion is that Unicode is a complex thing, and really requires a dedicated library like ICU. You can try to use simple tools like the converters of the standard library when you only deal with the BMP, but full support is far beyond that.
BTW: even other languages like Python that pretend to have a native unicode support (which is IMHO far better than current C++ one) ofter fails on some part:
So support for Unicode is poor for more than 10 years, and I do not really hope that things will go much better in the next 10 years...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With