In German text, umlauts (ä, ü, ö) and eszett (ß) are regular letters, but they don't seem to be covered by the \w
special character:
In [1]: re.match('(\w+)', 'Straße').groups()
Out[1]: ('Stra',)
Passing the re.UNICODE
flag to re.match
doesn't change anything.
Is there any better way to match a full word other than with [a-zA-ZäüöÄÜÖß]+
?
Since you are using python 2, you need to use unicode strings:
print re.match(ur'(\w+)',u'Straße',re.UNICODE).groups()[0]
Straße
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With