I'm looking to check if a large number is a valid Unicode character. I looked into the Char.IsSymbol(char)
function, but it requires a char as input. What I need is the equivalent of Char.IsSymbol(int)
. For example: Char.IsSymbol(340813);
char
is a 16-bit type in C#, representing a UTF-16 code unit, therefore the maximum value it can store is 65535 and Char.IsSymbol(340813)
doesn't work.
To check if a code point is a symbol or not you must convert the code point to a string and call the IsSymbol(String, Int32)
overload. To get the string use Char.ConvertFromUtf32(Int32)
which "Converts the specified Unicode code point into a UTF-16 encoded string."
int codepoint = 340813;
string character = Char.ConvertFromUtf32(codepoint);
return IsSymbol(character, 0);
To check if a code point is valid it's even easier, because the maximum value of Unicode characters is 0x10FFFF. For the reason read Why Unicode is restricted to 0x10FFFF?
That means you just need a simple if (codepoint <= 0x10FFFF)
although you may need to exclude the surrogate range 0xD800–0xDFFF because it's not valid values for single characters. So that results in
bool isValidUnicodeCharacter = codepoint <= 0x10FFFF &&
(codepoint < 0xD800 || codepoint > 0xDFFF)
You may want to check if the code point is valid or not before passing to Char.ConvertFromUtf32();
to avoid exceptions if your string contains a lot of invalid characters
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With