Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert "Fancy" unicode ABC to standard ABC

I run Regex checks on certain inputs on my site, but the Regex wrongfully returns false when users use "Fancy" Unicode sets such as:

Ⓜⓐⓣⓒⓗ 🅜🅐🅣🅒🅗 Match 𝐌𝐚𝐭𝐜𝐡 𝕸𝖆𝖙𝖈𝖍 𝑴𝒂𝒕𝒄𝒉 𝓜𝓪𝓽𝓬𝓱 𝕄𝕒𝕥𝕔𝕙 𝙼𝚊𝚝𝚌𝚑 𝖬𝖺𝗍𝖼𝗁 𝗠𝗮𝘁𝗰𝗵 𝙈𝙖𝙩𝙘𝙝 𝘔𝘢𝘵𝘤𝘩 ⒨⒜⒯⒞⒣ 🇲🇦🇹🇨🇭 🄼🄰🅃🄲🄷 🅼🅰🆃🅲🅷

These are not different fonts, they are different characters! None of these are matched by /Match/ (Proof)

How can I convert the user input to standard ABC characters before running through my Regex checks? (I'm using PHP, if that makes a difference)

like image 823
Fomo Avatar asked Oct 18 '25 13:10

Fomo


1 Answers

The NFKD unicode normalisation should take care of most of those. However, it seems it only works if intl module is enabled, and I don't have it in my environment, so I can't test it. If you also don't have such a PHP, and don't want to install it, this does something a bit similar, at least for some of the characters:

iconv("UTF-8", "ISO-8859-1//TRANSLIT", $text)

Finally, you can make your own mapping, for example using strtr (which you will then know to work, since you'd've written it yourself).

like image 137
Amadan Avatar answered Oct 21 '25 02:10

Amadan