What is equivalent of [:alpha:]
if I am making a unicode regex that need it.
For example for [:word:]
it is [\w]
will be great if I get some help.
For Unicode compliance, you need to use
regex = re.compile(r"[^\W\d_]", re.UNICODE)
Unicode character properties (like \p{L}
) are not supported by the current Python regex engine.
Explanation:
\w
matches (if the Unicode flag is set) any letter, digit or underscore.
[^\W]
matches the same thing, but with the negated character class, we can now subtract characters we don't want included:
[^\W\d_]
matches whatever \w
matches, but without digits (\d
) or underscore (_
).
>>> import re
>>> regex = re.compile(r"[^\W\d_]", re.UNICODE)
>>> regex.findall("aä12_")
['a', 'ä']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With