Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to count word frequency in python dataframe?

Tags:

python

pandas

I uploaded an excel text file. I want to count the number of times each word occurs, for instance:

Output:

was 2
report 1
county 5
increase 2

Code:

 news = pd.read_excel('C:\\Users\\farid-PC\\Desktop\\Tester.xlsx')
 pd.set_option('display.max_colwidth', 1000)
 print(news)
 #implement word counter?

Current Output:

   Text
0  Trump will drop a bomb on North Korea
1  Building a wall on the U.S.-Mexico border will take literally years
2  Wisconsin is on pace to double the number of layoffs this year.
3  Says John McCain has done nothing to help the vets.
4  Suzanne Bonamici supports a plan that will cut choice for Medicare 

Any help will be appreciated.

like image 331
user3658513 Avatar asked Jun 13 '26 11:06

user3658513


1 Answers

With pandas, using split, stack and value_counts:

series = df.Text.str.split(expand=True).stack().value_counts()

A python-based alternative using chain.from_iterable (to flatten) and Counter (to count):

from collections import Counter
from itertools import chain

counter = Counter(chain.from_iterable(map(str.split, df.Text.tolist()))) 

Re-create a Series of counts using:

series = pd.Series(counter).sort_values(ascending=False)

Which is identical to the pandas solution above and should be much faster since there is no stacking involved (stack is a slow operation).

like image 125
cs95 Avatar answered Jun 18 '26 01:06

cs95



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!