Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to match umlauts with regular expressions? [duplicate]

In German text, umlauts (ä, ü, ö) and eszett (ß) are regular letters, but they don't seem to be covered by the \w special character:

In [1]: re.match('(\w+)', 'Straße').groups()
Out[1]: ('Stra',)

Passing the re.UNICODE flag to re.match doesn't change anything.

Is there any better way to match a full word other than with [a-zA-ZäüöÄÜÖß]+?

like image 397
elpres Avatar asked Sep 05 '25 17:09

elpres


1 Answers

Since you are using python 2, you need to use unicode strings:

print re.match(ur'(\w+)',u'Straße',re.UNICODE).groups()[0]
Straße
like image 142
Keozon Avatar answered Sep 07 '25 17:09

Keozon