I'm trying to use re to match a pattern that starts with '\n', followed by a possible 'real(r8)', followed by zero or more white spaces and then followed by the word 'function', and then I want to split the string at where matches occur. So for this string,
text = '''functional \n function disdat \nkitkat function wakawak\nreal(r8) function noooooo \ndoit'''
I would like:
['functional ',
' disdat \nkitkat function wakawak',
' noooooo \ndoit']
However,
regex = re.compile(r'''\n(real\(r8\))?\s*\bfunction\b''')
regex.split(text)
returns
['functional ',
None,
' disdat \nkitkat function wakawak',
'real(r8)',
' noooooo \ndoit']
split
returns the matches' groups too. How do I ask it not to?
You can use non-capturing groups, like this
>>> regex = re.compile(r'\n(?:real\(r8\))?\s*\bfunction\b')
>>> regex.split(text)
['functional ', ' disdat \nkitkat function wakawak', ' noooooo \ndoit']
Note ?:
in (?:real\(r8\))
. Quoting Python documentation for (?:..)
A non-capturing version of regular parentheses. Matches whatever regular expression is inside the parentheses, but the substring matched by the group cannot be retrieved after performing a match or referenced later in the pattern.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With