Convert "Fancy" unicode ABC to standard ABC

Question

I run Regex checks on certain inputs on my site, but the Regex wrongfully returns false when users use "Fancy" Unicode sets such as:

Ⓜⓐⓣⓒⓗ 🅜🅐🅣🅒🅗 Ｍａｔｃｈ 𝐌𝐚𝐭𝐜𝐡 𝕸𝖆𝖙𝖈𝖍 𝑴𝒂𝒕𝒄𝒉 𝓜𝓪𝓽𝓬𝓱 𝕄𝕒𝕥𝕔𝕙 𝙼𝚊𝚝𝚌𝚑 𝖬𝖺𝗍𝖼𝗁 𝗠𝗮𝘁𝗰𝗵 𝙈𝙖𝙩𝙘𝙝 𝘔𝘢𝘵𝘤𝘩 ⒨⒜⒯⒞⒣ 🇲🇦🇹🇨🇭 🄼🄰🅃🄲🄷 🅼🅰🆃🅲🅷

These are not different fonts, they are different characters! None of these are matched by /Match/ (Proof)

How can I convert the user input to standard ABC characters before running through my Regex checks? (I'm using PHP, if that makes a difference)

Amadan · Accepted Answer

The NFKD unicode normalisation should take care of most of those. However, it seems it only works if intl module is enabled, and I don't have it in my environment, so I can't test it. If you also don't have such a PHP, and don't want to install it, this does something a bit similar, at least for some of the characters:

iconv("UTF-8", "ISO-8859-1//TRANSLIT", $text)

Finally, you can make your own mapping, for example using strtr (which you will then know to work, since you'd've written it yourself).

iconv("UTF-8", "ISO-8859-1//TRANSLIT", $text)

Finally, you can make your own mapping, for example using strtr (which you will then know to work, since you'd've written it yourself).

Convert "Fancy" unicode ABC to standard ABC

Tags:

regex

php

unicode

special-characters

preg-match

Fomo

1 Answers

Amadan

Recent Activity

Donate For Us

Convert "Fancy" unicode ABC to standard ABC

Tags:

regex

php

unicode

special-characters

preg-match

Fomo

1 Answers

Amadan

Related questions

Recent Activity

Donate For Us