Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python regex - (\w+) results different output when used with complex expression

Tags:

python

regex

I have doubt on python regex operation. Here you go my sample test.

>>>re.match(r'(\w+)','a-b') gives an output
>>> <_sre.SRE_Match object at 0x7f51c0033210>

>>>re.match(r'(\w+):(\d+)','a-b:1')
>>> 

Why does the 2nd regex condition doesn't give match object though the 1st regex gives match object for a normal string match condition, irrespective of special characters is available in the string?

However, \w+ will matches for [a-z,A-Z,_]. I'm not clear why (\w+) gives matched object for the string 'a-b'. How can I check whether the given string doesn't contain any special characters?

like image 532
Darknight Avatar asked Jan 21 '26 00:01

Darknight


1 Answers

Taking a look at the actual match will give you an idea of what happens.

>>> re.match(r'(\w+)', 'a-b')
<_sre.SRE_Match object at 0x0000000002DE45D0>
>>> _.groups()
('a',)

As you can see, the expression matched a. The character sequence \w only contains actual word characters, but not separators like dashes. So you can’t actually match a-b using just a \w+.

Now in the second expression one might think that it would match b:1 at least, given that \w+ matches b and :(\d+) does match the 1. However it does not happen due to how re.match works. As the documentation hints, it only tries to match “at the beginning of string. So when using re.match there is an implicit ^ at the beginning of the expression that makes it only match from the start. So it actually tries to find a match starting with a.

Instead, you can use re.search which actually looks in the whole string if it can match the expression anywhere. So there, you will get a result:

>>> re.search(r'(\w+):(\d+)', 'a-b:1')
<_sre.SRE_Match object at 0x0000000002E01B58>
>>> _.groups()
('b', '1')

For further information on the search vs. match topic, check this section in the manual.

And finally, if you want to match dashes too, you can use a character sequence [\w-] for example:

>>> re.match(r'([\w-]+):(\d+)', 'a-b:1')
<_sre.SRE_Match object at 0x0000000002E01B58>
>>> _.groups()
('a-b', '1')
like image 181
poke Avatar answered Jan 22 '26 14:01

poke