Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sorting a list using a regex in Python

Tags:

python

lambda

I have a list of email addresses with the following format:

name###@email.com

But the number is not always present. For example: [email protected], [email protected] [email protected], etc. I want to sort these names by the number, with those without a number coming first. I have come up with something that works, but being new to Python, I'm curious as to whether there's a better way of doing it. Here is my solution:

import re

def sortKey(name):
    m = re.search(r'(\d+)@', name)
    return int(m.expand(r'\1')) if m is not None else 0

names = [ ... a list of emails ... ]
for name in sorted(names, key = sortKey):
    print name

This is the only time in my script that I am ever using "sortKey", so I would prefer it to be a lambda function, but I'm not sure how to do that. I know this will work:

for name in sorted(names, key = lambda n: int(re.search(r'(\d+)@', n).expand(r'\1')) if re.search(r'(\d+)@', n) is not None else 0):
    print name

But I don't think I should need to call re.search twice to do this. What is the most elegant way of doing this in Python?

like image 217
user1174528 Avatar asked Dec 10 '25 23:12

user1174528


1 Answers

Better using re.findall as if no numbers are found, then it returns an empty list which will sort before a populated list. The key used to sort is any numbers found (converted to ints), followed by the string itself...

emails = '[email protected] [email protected] [email protected]'.split()

import re
print sorted(emails, key=lambda L: (map(int, re.findall('(\d+)@', L)), L))
# ['[email protected]', '[email protected]', '[email protected]']

And using john1 instead the output is: ['[email protected]', '[email protected]', '[email protected]'] which shows that although lexicographically after joe, the number has been taken into account first shifting john ahead.

There is a somewhat hackish way if you wanted to keep your existing method of using re.search in a one-liner (but yuck):

getattr(re.search('(\d+)@', s), 'groups', lambda: ('0',))()
like image 189
Jon Clements Avatar answered Dec 12 '25 12:12

Jon Clements



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!