Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to represent this utf-8 encoded string in Rust?

Tags:

utf-8

rust

utf

On this RFC: https://www.rfc-editor.org/rfc/rfc7616#page-19 at page 19, there's this example of a text encoded in UTF-8:

  J  U+00E4 s  U+00F8 n      D  o  e
  4A C3A4   73 C3B8   6E 20 44  6F 65

How do I represent it in a Rust String?

I tried https://mothereff.in/utf-8 and doing J\00E4s\00F8nDoe but it didn't work.

like image 260
Gatonito Avatar asked Sep 14 '25 16:09

Gatonito


1 Answers

"Jäsøn Doe" should work fine. Rust source files are always UTF-8 encoded and a string literal may contain any Unicode scalar value (that is, any code point except surrogates, which must not be encoded in UTF-8).

If your editor does not support UTF-8 encoding, but supports ASCII, you can use Unicode code point escapes, which are documented in the Rust reference:

A 24-bit code point escape starts with U+0075 (u) and is followed by up to six hex digits surrounded by braces U+007B ({) and U+007D (}). It denotes the Unicode code point equal to the provided hex value.

suggesting the correct syntax should be "J\u{E4}s\u{F8}n Doe".

like image 194
trent Avatar answered Sep 17 '25 08:09

trent