I'm trying to use regex to match every character between a string and a \r
character :
text = 'Some text\rText to find !\r other text\r'
I want to match 'Text to find !'
. I already tried :
re.search(r'Some text\r(.*)\r', text).group(1)
But it gives me : 'Text to find !\r other text'
It's surprising because it works perfectly when replacing \r
by \n
:
re.search(r'Some text\n(.*)\n', 'Some text\nText to find !\n other text\n').group(1)
returns Text to find !
Do you know why it behaves differently when we use \r
and \n
?
.*
is greedy in nature so it is matching longest match available in:
r'Some text\r(.*)\r
Hence giving you:
re.findall(r'Some text\r(.*)\r', 'Some text\rText to find !\r other text\r')
['Text to find !\r other text']
However if you change to non-greedy then it gives expected result as in:
re.findall(r'Some text\r(.*?)\r', 'Some text\rText to find !\r other text\r')
['Text to find !']
Reason why re.findall(r'Some text\n(.*)\n', 'Some text\nText to find !\n other text\n')
gives just ['Text to find !']
is that DOT matches any character except line break and \n
is a line break. If you enable DOTALL
then again it will match longest match in:
>>> re.findall(r'Some text\n([\s\S]*)\n', 'Some text\nText to find !\n other text\n')
['Text to find !\n other text']
>>> re.findall(r'(?s)Some text\n(.*)\n', 'Some text\nText to find !\n other text\n')
['Text to find !\n other text']
Which again changes behavior when you use non-greedy quantifier:
re.findall(r'(?s)Some text\n(.*?)\n', 'Some text\nText to find !\n other text\n')
['Text to find !']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With