How to match umlauts with regular expressions? [duplicate]

Question

In German text, umlauts (ä, ü, ö) and eszett (ß) are regular letters, but they don't seem to be covered by the \w special character:

In [1]: re.match('(\w+)', 'Straße').groups()
Out[1]: ('Stra',)

Passing the re.UNICODE flag to re.match doesn't change anything.

Is there any better way to match a full word other than with [a-zA-ZäüöÄÜÖß]+?

Keozon · Accepted Answer

Since you are using python 2, you need to use unicode strings:

print re.match(ur'(\w+)',u'Straße',re.UNICODE).groups()[0]
Straße

Donate For Us