I have the Yelp dataset and I want to count all reviews which have greater than 3 stars. I get the count of reviews by doing this:
reviews.groupby('business_id')['stars'].count()
Now I want to get the count of reviews which had more than 3 stars, so I tried this by taking inspiration from here:
reviews.groupby('business_id')['stars'].agg({'greater':lambda val: (val > 3).count()})
But this just gives me the count of all stars like before. I am not sure if this is the right way to do it? What am I doing incorrectly here. Does the lambda expression not go through each value of the stars column?
EDIT: Okay I feel stupid. I should have used the sum function instead of count to get the value of elements greater than 3, like this:
reviews.groupby('business_id')['stars'].agg({'greater':lambda val: (val > 3).sum()})
x > x. mean() gives True if the element is larger than the mean and 0 otherwise, sum then counts the number of Trues.
Pandas DataFrame: ge() function The ge() function returns greater than or equal to of dataframe and other, element-wise. Equivalent to ==, =!, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.
count_nonzero() function. It will return the count of True values in Series i.e. count of values greater than the given limit in the selected column.
Using the size() or count() method with pandas. DataFrame. groupby() will generate the count of a number of occurrences of data present in a particular column of the dataframe.
You can try to do :
reviews[reviews['stars'] > 3].groupby('business_id')['stars'].count()
As I also wanted to rename the column and to run multiple functions on the same column, I came up with the following solution:
# Counting both over and under
reviews.groupby('business_id')\
       .agg(over=pandas.NamedAgg(column='stars', aggfunc=lambda x: (x > 3).sum()), 
            under=pandas.NamedAgg(column='stars', aggfunc=lambda x: (x < 3).sum()))\
       .reset_index()
The pandas.NamedAgg allows you to create multiple new columns now that the functionality was removed in newer versions of pandas.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With