If I create a string literal with the u8 prefix, does the machine code knows and says, that the corresponding value of that variable should be encoded in UTF-8?
So that no matter where I run the program, the computer knows how to encode it every time? Or does the machine code doesn't say, encode it like this and this?
Because if I encode something in normal char, and something in UTF-8 (e.g. with u8), then what is the difference and how does the computer know the encoding, if the machine code doesn't say anything about it?
u8"..." strings are always encoded in UTF-8, as specified in [lex.string]/1.
The encoding of "..." strings depends on the compiler (and on the source file encoding), but it shouldn't be hard to configure your IDE to save files in UTF-8, and your compiler to not touch UTF-8 in plain string literals.
In any case, the encoding is handled entirely at compile-time. In the compiled code the strings are just sequences of bytes; there is no conversion between encodings at runtime, unless you explicitly call some function that does that.
If I create a string literal with the u8 prefix, does the machine code knows and says, that the corresponding value of that variable should be encoded in UTF-8?
Machine code knows nothing. Compiler encodes the literal into UTF-8 and generate the correct sequence of bytes.
So that no matter where I run the program, the computer knows how to encode it every time? Or does the machine code doesn't say, encode it like this and this?
The sequence of bytes is then emitted at runtime and the output device that will receive this sequence will translate it correctly if it knows how to. That means that, for example, a console that accepts UTF-8 encoding will show correct chars, if not garbage is shown.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With