Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use of \r (carriage return) in python regex

Tags:

python

regex

I'm trying to use regex to match every character between a string and a \r character :

text = 'Some text\rText to find !\r other text\r'

I want to match 'Text to find !'. I already tried :

re.search(r'Some text\r(.*)\r', text).group(1)

But it gives me : 'Text to find !\r other text'

It's surprising because it works perfectly when replacing \r by \n :

re.search(r'Some text\n(.*)\n', 'Some text\nText to find !\n other text\n').group(1)

returns Text to find !

Do you know why it behaves differently when we use \r and \n ?

like image 414
LeoGlt Avatar asked Sep 14 '25 22:09

LeoGlt


1 Answers

.* is greedy in nature so it is matching longest match available in:

r'Some text\r(.*)\r

Hence giving you:

re.findall(r'Some text\r(.*)\r', 'Some text\rText to find !\r other text\r')
['Text to find !\r other text']

However if you change to non-greedy then it gives expected result as in:

re.findall(r'Some text\r(.*?)\r', 'Some text\rText to find !\r other text\r')
['Text to find !']

Reason why re.findall(r'Some text\n(.*)\n', 'Some text\nText to find !\n other text\n') gives just ['Text to find !'] is that DOT matches any character except line break and \n is a line break. If you enable DOTALL then again it will match longest match in:

>>> re.findall(r'Some text\n([\s\S]*)\n', 'Some text\nText to find !\n other text\n')
['Text to find !\n other text']

>>> re.findall(r'(?s)Some text\n(.*)\n', 'Some text\nText to find !\n other text\n')
['Text to find !\n other text']

Which again changes behavior when you use non-greedy quantifier:

re.findall(r'(?s)Some text\n(.*?)\n', 'Some text\nText to find !\n other text\n')
['Text to find !']
like image 196
anubhava Avatar answered Sep 16 '25 11:09

anubhava