I run Regex checks on certain inputs on my site, but the Regex wrongfully returns false when users use "Fancy" Unicode sets such as:
Ⓜⓐⓣⓒⓗ 🅜🅐🅣🅒🅗 Match 𝐌𝐚𝐭𝐜𝐡 𝕸𝖆𝖙𝖈𝖍 𝑴𝒂𝒕𝒄𝒉 𝓜𝓪𝓽𝓬𝓱 𝕄𝕒𝕥𝕔𝕙 𝙼𝚊𝚝𝚌𝚑 𝖬𝖺𝗍𝖼𝗁 𝗠𝗮𝘁𝗰𝗵 𝙈𝙖𝙩𝙘𝙝 𝘔𝘢𝘵𝘤𝘩 ⒨⒜⒯⒞⒣ 🇲🇦🇹🇨🇭 🄼🄰🅃🄲🄷 🅼🅰🆃🅲🅷
These are not different fonts, they are different characters! None of these are matched by /Match/
(Proof)
How can I convert the user input to standard ABC characters before running through my Regex checks? (I'm using PHP, if that makes a difference)
The NFKD unicode normalisation should take care of most of those. However, it seems it only works if intl
module is enabled, and I don't have it in my environment, so I can't test it. If you also don't have such a PHP, and don't want to install it, this does something a bit similar, at least for some of the characters:
iconv("UTF-8", "ISO-8859-1//TRANSLIT", $text)
Finally, you can make your own mapping, for example using strtr
(which you will then know to work, since you'd've written it yourself).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With