I have the below synopsis of a df:
  movie id       movie title release date                                           IMDb URL                      genre  user id  rating  
0         2  GoldenEye (1995)     1-Jan-95  http://us.imdb.com/M/title-exact?GoldenEye%20(...  Action|Adventure|Thriller        5       3  
1         2  GoldenEye (1995)     1-Jan-95  http://us.imdb.com/M/title-exact?GoldenEye%20(...  Action|Adventure|Thriller      268       2  
2         2  GoldenEye (1995)     1-Jan-95  http://us.imdb.com/M/title-exact?GoldenEye%20(...  Action|Adventure|Thriller      276       4  
3         2  GoldenEye (1995)     1-Jan-95  http://us.imdb.com/M/title-exact?GoldenEye%20(...  Action|Adventure|Thriller      217       3  
4         2  GoldenEye (1995)     1-Jan-95  http://us.imdb.com/M/title-exact?GoldenEye%20(...  Action|Adventure|Thriller       87       4  
What i'm looking for is count 'user id' and average 'rating' and keep all other columns intact. So the result will be something like this:
  movie id       movie title release date                                           IMDb URL                      genre  user id     rating  
0         2  GoldenEye (1995)     1-Jan-95  http://us.imdb.com/M/title-exact?GoldenEye%20(...  Action|Adventure|Thriller      50       3.75  
1         3  Four Rooms (1995)    1-Jan-95  http://us.imdb.com/M/title-exact?GoldenEye%20(...  Action|Adventure|Thriller      35       2.34  
any idea how to do that?
Thanks
If all the values are in the columns you are aggregating over are the same for each group then you can avoid the join by putting them into the group.
Then pass a dictionary of functions to agg. If you set as_index to False to keep the grouped by columns as columns:
df.groupby(['movie id','movie title','release date','IMDb URL','genre'], as_index=False).agg({'user id':len,'rating':'mean'})
Note len is used to count
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With