Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can you warn/error when mixing char8_t and char32_t in expressions?

I have a code base which makes extensive use of char8_t and char32_t to represent UTF-8 code units and Unicode code points respectively. A common mistake/bug in this code base is to compare char8_t to char32_t literals, or call functions taking char32_t using a char8_t argument.

While no loss of precision occurs in char8_t -> char32_t, it is conceptually wrong:

bool contains_oe(std::u8string_view str) {
    for (char8_t c : str)
        if (c == U'ö') // comparison always fails
            return true;
    return false;
}

Assuming that str is correctly UTF-8 encoded, this function always returns false because ö is UTF-8 encoded as 0xC3 0xB6. Also, ö is U+00F6, and no UTF-8 code unit can be 0xF6.

A bug like this could have been easily prevented if I could somehow detect comparisons of char8_t and char32_t automatically.

Is there a way to do that using GCC compiler flags, Clang compiler flags, clang-tidy, or some other automatic tool?

like image 639
Jan Schultke Avatar asked Jan 30 '26 19:01

Jan Schultke


1 Answers

As of GCC 15, Clang and Clang-Tidy 20, it seems there are no warnings or checks that would be helpful in this case. However, you could write your own Clang-Tidy check.


Alternatively, you could design your interfaces to avoid implicit conversions by using either an enum class or a class. For instance:

class CodeUnit
{
public:
    explicit CodeUnit(char8_t codeUnit = {}) : codeUnit(codeUnit) {}
    explicit CodeUnit(char32_t codeUnit) = delete;
    bool operator==(const CodeUnit&) const = default;

    // ...

private:
    char8_t codeUnit;
};

You can then keep functions that need to use code units (e.g. conversion to UTF-32) inside the CodeUnit class. Then if you try to do

bool contains_oe(std::span<CodeUnit> str) {
    for (CodeUnit c : str)
        if (c == U'ö') // comparison always fails
            return true;
    return false;
}

you just get an error.

Of course, that doesn't solve the root issue, but it will at least limit where it can occur.

like image 78
LHLaurini Avatar answered Feb 01 '26 11:02

LHLaurini



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!