I am trying to breakup/split a string into words.
def breakup(text):
temp = []
temp = re.split('\W+', text.rstrip())
return [e.lower() for e in temp]
Example Strings:
What's yellow, white, green and bumpy? A pickle wearing a tuxedo
Result:
['what', 's', 'yellow', 'white', 'green', 'and', 'bumpy', 'a', 'pickle', 'wearing', 'a', 'tuxedo']
but when i pass a string like
How is a locksmith like a typewritter? They both have a lot of keys!
['how', 'is', 'a', 'locksmith', 'like', 'a', 'typewritter', 'they', 'both', 'have', 'a', 'lot', 'of', 'keys', '']
I want to parse in a way that it doesn't get empty string in the list.
The string passed will have punctuation etc. Any ideas.
How about searching for what you want:
[ s.lower() for s in
re.findall(r'\w+',
"How is a locksmith like a typewritter? They both have a lot of keys!") ]
Or to build just one list:
[ s.group().lower() for s in
re.finditer(r'\w+',
"How is a locksmith like a typewritter? They both have a lot of keys!") ]
Just change
return [e.lower() for e in temp]
to
return [e.lower() for e in temp if e]
Also, the line
temp = []
is not needed, since you never use the empty list you asign to temp
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With