im trying to get a start and stop index number of a word inside a string using re.finditer. for most of it my pattern working fine, but for a word with special character my regex giving me an error
Problem:
I tried:
a = " we have c++ and c#"
pattern = ['c#','c++']
regex = re.compile(r'\b(' + '|'.join(pattern) + r')\b')
out = [ (m.start(0), m.end(0)) for m in regex.finditer(a)]
Current Output:
error: multiple repeat at position x
Expected Output :
[(9,12),(17,19)]
for most of case my pattern working fine but word with special character I'm having a problem. I'm not much familiar with regex, any one please help out of it, Thanks!
Code:
a = " we have c++ and c#"
pattern = [ r'\b{}(?=\s|$)'.format(re.escape(s)) for s in ['c#','c++']]
regex = re.compile('|'.join(pattern))
[ (m.start(0), m.end(0)) for m in regex.finditer(a)]
Details:
The first problem is, special characters; you can escape special characters manually
'c\\+\\+', 'c\\#\\#']
or to simplify you can use re.escape, it would do that work for you
re.escape('c++, c##')
The second problem is, word boundaries; they won't behave the same way for special characters as they would for alphanumeric characters e.g. \bfoo\b
To quote from python docs
\b word boundary
Matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of alphanumeric or underscore characters, so the end of a word is indicated by whitespace or a non-alphanumeric, non-underscore character. Note that formally, \b is defined as the boundary between a \w and a \W character (or vice versa), or between \w and the beginning/end of the string, so the precise set of characters deemed to be alphanumeric depends on the values of the UNICODE and LOCALE flags. For example, r'\bfoo\b' matches 'foo', 'foo.', '(foo)', 'bar foo baz' but not 'foobar' or 'foo3'.
To make this work, you can use positive lookahead assertion
r'\b{}(?=\s|$)'
It looks for a whitespace (\s) character or end of the sentence ($) after your pattern
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With