Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count words in a column of strings in Pandas

I have a pandas dataframe that contains queries and counts for a given time period and I'm hoping to turn this dataframe into a count of unique words. For example, if the dataframe contained the below:

query          count
foo bar        10
super          8 
foo            4
super foo bar  2

I'm looking to receive the below dataframe. e.g. the word 'foo' appears exactly 16 times within the table.

word    count
foo     16
bar     12
super   10

I'm working with the below function, but it hardly seems like the optimal way to do this and it ignores the total count for each row.

def _words(df):
  return Counter(re.findall(r'\w+', ' '.join(df['query'])))

Any help will be greatly appreciated.

Thanks in advance!

like image 557
sokeefe Avatar asked Nov 01 '25 08:11

sokeefe


1 Answers

Option 1

df['query'].str.get_dummies(sep=' ').T.dot(df['count'])

bar      12
foo      16
super    10
dtype: int64

Option 2

df['query'].str.get_dummies(sep=' ').mul(df['count'], axis=0).sum()

bar      12
foo      16
super    10
dtype: int64

Option 3
numpy.bincount + pd.factorize
also highlighting the use of cytoolz.mapcat. It returns an iterator where it maps a function and concatenates the results. That's cool!

import pandas as pd, numpy as np, cytoolz

q = df['query'].values
c = df['count'].values

f, u = pd.factorize(list(cytoolz.mapcat(str.split, q.tolist())))
l = np.core.defchararray.count(q.astype(str), ' ') + 1

pd.Series(np.bincount(f, c.repeat(l)).astype(int), u)

foo      16
bar      12
super    10
dtype: int64

Option 4
Absurd use of stuff... just use option 1.

pd.DataFrame(dict(
    query=' '.join(df['query']).split(),
    count=df['count'].repeat(df['query'].str.count(' ') + 1)
)).groupby('query')['count'].sum()

query
bar      12
foo      16
super    10
Name: count, dtype: int64
like image 150
piRSquared Avatar answered Nov 03 '25 23:11

piRSquared



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!