I have this column in MS Excel 2010 - which has a combination of 'zip code' and 'email-ids'
I am trying to extract these zip-codes(20530, 90012-3308 etc.) from this column.
20530 [email protected]
20530 [email protected]
20530 [email protected]
20530 [email protected]
20004 [email protected]
20530 [email protected]
90012-3308 [email protected]
90012-3308 [email protected]
90012 [email protected]
I tried Python's re module.
import re
for i in range(1, 9):
Cell(i, 4).value = re.findall(r'\d+', Cell(i, 1).value) #storing result in column4
I ran the regex on that column and I got this result:
[u'20530']
[u'20530']
[u'20530']
[u'20530']
[u'20004', u'9']
[u'20530', u'8']
[u'90012', u'3308']
[u'90012', u'3308']
[u'90012']
How can I extract the results, into the human readable zip-code form?
Why can't you just split?
>>> '20530 [email protected]'.split()
['20530', '[email protected]']
Then just grab the first element.
>>> '20530 [email protected]'.split()[0]
'20530'
For all your data:
l = ['20530 [email protected]',
'20530 [email protected]',
'20530 [email protected]',
'20530 [email protected]',
'20004 [email protected]',
'20530 [email protected]',
'90012-3308 [email protected]',
'90012-3308 [email protected]',
'90012 [email protected]']
[entry.split()[0] for entry in l]
Result
['20530', '20530', '20530', '20530', '20004', '20530', '90012-3308', '90012-3308', '90012']
The following regular expression will match each string and extract the postal code as group 1:
([\d\-]+)\s+[\w@\.]+
Here's the Python code to extract all of the postal codes at once:
import re
text = r''' 20530 [email protected]
20530 [email protected]
20530 [email protected]
20530 [email protected]
20004 [email protected]
20530 [email protected]
90012-3308 [email protected]
90012-3308 [email protected]
90012 [email protected]'''
re.compile(r'([\d\-]+)\s+[\w@\.]+').findall(text)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With