Count no. of tokens in every row of a column in dataframe

Question

I have a dataframe with data in the of similar format

    song                    lyric                                tokenized_lyrics
0   Song 1  Look at her face, it's a wonderful face  [look , at , her ,face, it's a wonderful, face ]
1   Song 2  Some lyrics of the song taken            [Some, lyrics ,of, the, song, taken]

I want to count the no of words in the lyrics per song and an output like

song     count
song 1     8
song 2     6

I tried aggregate function but it is not yielding the correct result.

Code I tried :

df.groupby(['song']).agg(
word_count = pd.NamedAgg(column='text' , aggfunc = 'count' )
)

How can I achieve the desired result

sammywemmy · Accepted Answer

I couldnt copy tokenized_lyrics as a list, it came in as a string, so I tokenized the lyrics, with the assumption that the delimiter is a white space:

df['token_count'] = df.lyric.str.replace(',','').str.split().str.len()
df.filter(['song','token_count'])

    song    token_count
0   Song 1      8
1   Song 2      6

note that you can just apply string len to the tokenized lyrics to get your count, since it is a list, it will count the individual items

Count no. of tokens in every row of a column in dataframe

Tags:

python

pandas

numpy

pandas-groupby

Aklank Jain

1 Answers

sammywemmy

Recent Activity

Donate For Us

Count no. of tokens in every row of a column in dataframe

Tags:

python

pandas

numpy

pandas-groupby

Aklank Jain

1 Answers

sammywemmy

Related questions

Recent Activity

Donate For Us