I have a (possibly) line splitted definition file which has the following similar pattern:
group-definition "first-regex" "second-regex"
both sub-regex are actual regex, and I need to check for the "main" syntax. The Python return should get me the following data:
Also, the sub-regex definitions might use both single and double quotes, so the following syntax could be correct also:
definition "first-regex.*" 'second-regex[0-9]' #some comment
I also need to find out if the syntax is somehow correct, so the following string won't be recognized as correct:
something-right "something wrong' 'really-\.wrong" wtf
That's because I need 2 regex to process afterwards, and without any further data added (unless it's a comment starting with both "#" or ";").
Unfortunately, my experience with regex is not that deep, but I know that using something like this won't work as expected:
[\.]* (\".+?\")|(\'.+?\')[\ ](\".+?\")|(\'.+?\')
I suppose that I'd need some deeper knowledge of how regex sub-groups work, but I've not been able to understand how to get them right yet.
I know that there're plenty of questions and answers about this kind of topic, but I wasn't able to find the right search context for this kind of issue.
You're on the right track. I'll assume all the following are valid statements
definition 'regex1' "regex2"
definition # Comment
'regex1' # Comment
'regex2'
You might want to look into named captures. your pattern should allow for comments or white space between each argument. And you must remember to use the re.S flag which will allow you to capture '\n' with '.'
import re
pattern = """(?P<definition>[\w\-]+) # Your definition equivalent to [a-zA-Z\-_]
(?P<break1>(\s|#.*?\n)*?) # Optional to match comments and spaces
(?P<reg1>\'.*?\'|\".*?\") # Regex pattern1
(?P<break2>(\s|#.*?\n)*?) # Another optional break
(?P<reg2>\'.*?\'|\".*?\") # Pattern2 """
with open('your_document', 'r') as f:
for match in re.finditer(pattern, f.read(), re.X | re.S):
# do something with each match
re.X allows the pattern to be verbose. re.S as said before will allow you to match new lines in the break sub-groups. finditer is a very useful tool to match many times as it will find all non overlapping matches and yield the matches.
(?P<name>pattern) allows sub-captures to be accessed by name. So you can access them by
match['definintion']
match['reg1']
match['reg2']
Read the documentation for more info
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With