What does this Python regex match?
.*?[^\\]\n
I'm confused about why the .
is followed by both *
and ?
.
*
means "match the previous element as many times as possible (zero or more times)".
*?
means "match the previous element as few times as possible (zero or more times)".
The other answers already address this, but what they don't bring up is how it changes the regex, well if the re.DOTALL
flag is provided it makes a huge difference, because .
will match line break characters with that enabled. So .*[^\\]\n
would match from the beginning of the string all the way to the last newline character that is not preceeded by a backslash (so several lines would match).
If the re.DOTALL
flag is not provided, the difference is more subtle, [^\\]
will match everything other than backslash, including line break characters. Consider the following example:
>>> import re
>>> s = "foo\n\nbar"
>>> re.findall(r'.*?[^\\]\n', s)
['foo\n']
>>> re.findall(r'.*[^\\]\n', s)
['foo\n\n']
So the purpose of this regex is to find non-empty lines that don't end with a backslash, but if you use .*
instead of .*?
you will match an extra \n
if you have an empty line following a non-empty line.
This happens because .*?
will only match fo
, [^\\]
will match the second o
, and the the \n
matches at the end of the first line. However the .*
will match foo
, the [^\\]
will match the \n
to end the first line, and the next \n
will match because the second line is blank.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With