Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Average word length of a column using Python

This is My Column:

ReviewText
Absolutely wonderful silky and..
Love this dress! it is so pretty...
I had such high hopes for...

I wanted to create a new column called Avg_length that would include the average length of words in the ReviewText column...

I wrote this following code to split the string after every whitespace:

df['Avg_length'] = df["Review Text"].apply(lambda x: len(x.split()))

Now how to calculate average?

Thanks in advance...

like image 873
Yilmaz Avatar asked Nov 20 '25 23:11

Yilmaz


1 Answers

Your code is calculating the number of words not the length of each word.

import numpy as np
...
df['Avg_length'] = df["Review Text"].apply(lambda x: np.mean([len(w) for w in x.split()]))

The element of the row in the Review Text column is a string representing the whole sentence/review. So when the anonymous lambda function is applied, the x input of the lambda function is the entire sentence.

Calling x.split() produces the list of words. The list comprehension

[len(w) for w in x.split()]

takes that list of words and iterates through it, so the variable w gets set to one word after another. For each word, w, the number of characters i.e. len(w) is evaluated. The list comprehension returns the list of lengths for each word i.e. number of characters for each word. The result of the list comprehension is therefore just a list of numbers.

np.mean() replaces this list of numbers with one number which is their average. This one number is the final output of the lambda function and is put into the new column being constructed for that row. This repeats for all rows.

like image 51
Matt Miguel Avatar answered Nov 22 '25 12:11

Matt Miguel