Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# regex to match emoji

Tags:

c#

regex

emoji

I would like a regex to match emoji characters in C#. If it matters, it's the characters from the Windows 8 touch keyboard ie. 😝 🍟🌃

like image 810
Jippers Avatar asked Oct 23 '25 14:10

Jippers


2 Answers

\p{So}|\p{Cs}\p{Cs}(\p{Cf}\p{Cs}\p{Cs})* match all emojis I've tried and only those.

StringInfo was useful to make the pattern and might be usable directly instead of regex in some cases.

The pattern uses unicode categories, as shown in @MohaMad's answer. Again, with comments:

@"(?x)           # Enable free-spacing-mode (could have used RegexOptions instead)
\p{So}           # Match OtherSymbol, like ⏸ and ✅
|\p{Cs}\p{Cs}    # OR two Surrogate
 \uD83C\p{Cs}    # with color-modifier, like 👍🏿 and 👍
                 # (Hacky special case of Multibyte Character Set? It works.)
|\p{Cs}\p{Cs}    # OR two Surrogate, like 🔀 and 🧊
 (\p{Cf}         # followed by a Format
 \p{Cs}\p{Cs})   # and two Surrogate, like 👩‍💻 and 👨‍💻.
*                # zero or more times (I've only seen none or once.)"
like image 106
Grastveit Avatar answered Oct 25 '25 05:10

Grastveit


There seems to be an Emoji-to-Unicode standard:

https://en.wikipedia.org/wiki/Emoji#In_Unicode

So you can probably match each of the Unicode ranges. For example, to match the range from U+1F30x to U+1F5Fx you can use [\u1F30-\u1F5F] etc.

like image 29
Ilya Kogan Avatar answered Oct 25 '25 04:10

Ilya Kogan



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!