I'm new to Python, which I'm using to do an ugly little put-this-tabular-data-into-a-db conversion. The program looks at the data, creates a table in MySQL, and then reads the data into the table. In this section, header row text is checked to make some decision about data typing. I had an idea that I could be clever and do this with a single regex rather than if/elifs. My solution works for this case at least, where I don't have to worry about multiple matches. What I'm asking is, is there any real merit to this approach in terms of efficiency?
def _typeMe(self, header_txt):
# data typing
colspecs = {
'id':'SMALLINT(10)',
'date':'DATE',
'comments':'TEXT(4000)',
'flag':'BIT(1)',
'def':'VARCHAR(255)'
}
# regex to match on header text e.g. 'Provisioner ID'
r = re.search(re.compile('(ID$)|(Date)|(Comments$)|(FLAG$)', re.IGNORECASE), header_txt)
checktype = lambda m: max(m.groups()).lower() if m else 'def'
return colspecs[checktype(r)]
Absolutely; what you've got is called data-driven programming. In general it's good style because it allows you to make changes easily without having to worry about duplicating code sections.
In terms of performance it's unlikely to make much difference; the important thing is that it's more readable and more maintainable than the alternative.
I agree with @ecatmur's answer; I just wanted to post some slight code suggestions that are a little too long for a comment.
There's no need to do re.search(re.compile('...', re.IGNORECASE), header_text). Instead, you can just pass the string straight in as re.search('...', header_text, re.IGNORECASE). If you're using the same regex over and over, re.compile is faster, but re.search and friends will call it for you if you didn't.
Though I don't share Colin's disdain for named lambdas (it can be handy just because they're still one line instead of two), you don't need an inner function here at all:
return colspecs[max(m.groups()).lower() if m else 'def']
The max(m.groups()) trick also isn't necessary if you just make one capturing group instead of four: '(ID|Date|Comments|Flag)$'. Then you can do m.group(1).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With