Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing text with regular expression into list with empty string in result

Tags:

python

regex

I am trying to breakup/split a string into words.

    def breakup(text):
        temp = []
        temp = re.split('\W+', text.rstrip())   
        return [e.lower() for e in temp]

Example Strings:

What's yellow, white, green and bumpy? A pickle wearing a tuxedo

Result:

['what', 's', 'yellow', 'white', 'green', 'and', 'bumpy', 'a', 'pickle', 'wearing', 'a', 'tuxedo']

but when i pass a string like

How is a locksmith like a typewritter? They both have a lot of keys!

['how', 'is', 'a', 'locksmith', 'like', 'a', 'typewritter', 'they', 'both', 'have', 'a', 'lot', 'of', 'keys', '']

I want to parse in a way that it doesn't get empty string in the list.

The string passed will have punctuation etc. Any ideas.

like image 517
jamesT Avatar asked Jan 21 '26 09:01

jamesT


2 Answers

How about searching for what you want:

[ s.lower() for s in
  re.findall(r'\w+',
    "How is a locksmith like a typewritter? They both have a lot of keys!") ]

Or to build just one list:

[ s.group().lower() for s in
  re.finditer(r'\w+',
    "How is a locksmith like a typewritter? They both have a lot of keys!") ]
like image 105
Alfe Avatar answered Jan 23 '26 21:01

Alfe


Just change

return [e.lower() for e in temp]

to

return [e.lower() for e in temp if e]

Also, the line

temp = []

is not needed, since you never use the empty list you asign to temp

like image 21
sloth Avatar answered Jan 23 '26 23:01

sloth



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!