Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

shorthand for [:alpha:] in python regex

What is equivalent of [:alpha:] if I am making a unicode regex that need it.

For example for [:word:] it is [\w]

will be great if I get some help.

like image 904
hjelpmig Avatar asked Oct 16 '25 12:10

hjelpmig


1 Answers

For Unicode compliance, you need to use

regex = re.compile(r"[^\W\d_]", re.UNICODE)

Unicode character properties (like \p{L}) are not supported by the current Python regex engine.

Explanation:

\w matches (if the Unicode flag is set) any letter, digit or underscore.

[^\W] matches the same thing, but with the negated character class, we can now subtract characters we don't want included:

[^\W\d_] matches whatever \w matches, but without digits (\d) or underscore (_).

>>> import re
>>> regex = re.compile(r"[^\W\d_]", re.UNICODE)
>>> regex.findall("aä12_")
['a', 'ä']
like image 97
Tim Pietzcker Avatar answered Oct 18 '25 07:10

Tim Pietzcker



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!