Let's say I have:
sentences = ['The girls are gorgeous', 'I'm mexican']
And I want to obtain:
words = ['The','girls','are','gorgeous', 'I'm', 'mexican']
I tried:
words = [w.split(' ') for w in sentences]
but got not expected result.
Will this work for Counter(words) as I need to obtain the frequency?
Try like this
sentences = ["The girls are gorgeous", "I'm mexican"]
words = [word for sentence in sentences for word in sentence.split(' ')]
Your method didn't work because, split returns a list. So, your code creates a nested list. You need to flatten it to use it with Counter. You can flatten it in so many ways.
from itertools import chain
from collections import Counter
Counter(chain.from_iterable(words))
would have been the best way to flatten the nested list and find the frequency. But you can use a generator expression, like this
sentences = ['The girls are gorgeous', "I'm mexican"]
from collections import Counter
print Counter(item for items in sentences for item in items.split())
# Counter({'mexican': 1, 'girls': 1, 'are': 1, 'gorgeous': 1, "I'm": 1, 'The':1})
This takes each sentence, splits that to get the list of words, iterates those words and flattens the nested structure.
If you want to find top 10 words, then you can use Counter.most_common method, like this
Counter(item for items in sentences for item in items.split()).most_common(10)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With