Is there a Pythonic 'standard' for how regular expressions should be used?
What I typically do is perform a bunch of re.compile statements at the top of my module and store the objects in global variables... then later on use them within my functions and classes.
I could define the regexs within the functions I would be using them, but then they would be recompiled every time.
Or, I could forgo re.compile completely, but if I am using the same regex many times it seems like recompiling would incur unnecessary overhead.
One way that would be a lot cleaner is using a dictionary:
PATTERNS = {'pattern1': re.compile('foo.*baz'),
'snake': re.compile('python'),
'knight': re.compile('[Aa]rthur|[Bb]edevere|[Ll]auncelot')}
That would solve your problem of having a polluted namespace, plus it's pretty obvious to anyone looking at your code what PATTERNS is and will be used for, and it satisfies the CAPS convention for globals. In addition, you can easily call re.match(PATTERNS[pattern]), or whatever it is your logic calls for.
I also tend to use your first approach but I've never benchmarked this. One thing to note, from the documentation, is that:
The compiled versions of the most recent patterns passed to re.match(), re.search() or re.compile() are cached, so programs that use only a few regular expressions at a time needn’t worry about compiling regular expressions.
One worry is that you could have regular expressions that don't get used. If you compile all expressions at module load time you could be incurring the cost of compiling the expression but never benefiting from that "optimization". I don't suppose this would matter unless you compile lots of regular expressions that never get used.
One thing I do recommend is to use the re.VERBOSE (or re.X) flag and include comments and white space to make anything beyond the most trivial regular expression more readable.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With