Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check if a large number is a valid Unicode character

Tags:

c#

char

unicode

I'm looking to check if a large number is a valid Unicode character. I looked into the Char.IsSymbol(char) function, but it requires a char as input. What I need is the equivalent of Char.IsSymbol(int). For example: Char.IsSymbol(340813);

like image 771
user2729463 Avatar asked Sep 05 '25 03:09

user2729463


1 Answers

char is a 16-bit type in C#, representing a UTF-16 code unit, therefore the maximum value it can store is 65535 and Char.IsSymbol(340813) doesn't work.

To check if a code point is a symbol or not you must convert the code point to a string and call the IsSymbol(String, Int32) overload. To get the string use Char.ConvertFromUtf32(Int32) which "Converts the specified Unicode code point into a UTF-16 encoded string."

int codepoint = 340813;
string character = Char.ConvertFromUtf32(codepoint);
return IsSymbol(character, 0);

To check if a code point is valid it's even easier, because the maximum value of Unicode characters is 0x10FFFF. For the reason read Why Unicode is restricted to 0x10FFFF?

That means you just need a simple if (codepoint <= 0x10FFFF) although you may need to exclude the surrogate range 0xD800–0xDFFF because it's not valid values for single characters. So that results in

bool isValidUnicodeCharacter = codepoint <= 0x10FFFF && 
                               (codepoint < 0xD800 || codepoint > 0xDFFF)

You may want to check if the code point is valid or not before passing to Char.ConvertFromUtf32(); to avoid exceptions if your string contains a lot of invalid characters

like image 176
phuclv Avatar answered Sep 08 '25 01:09

phuclv