Pandas: How to calculate the average of a groupby [duplicate]

Question

I have a csv file containing few attributes and one of them is the star ratings of different restaurants etoiles (means star in french). Here annee means the year when the rating was made.

note: I dont know how to share a Jupyter notebook table output in here, i tried different command lines but the format was always ugly. If someone could help with that.

enter image description here

And what i want to do is pretty simple (I think).. I want to add a new column that represents the standard deviation of the mean of stars per year of a restaurant. So I must estimate the average stars rating per year. Then, calculate the standard deviation on these values. But, I dont really know the syntax using pandas that will allow me to calculate the average star rating of a restaurant per year. Any suggestions?

I understand that I need to group the restaurants by year with .groupby('restaurant_id')['annee'] and then take the average of the stars rating of the restaurant during that year but i dont know how to write it.


# does not work
avis['newColumn'] = (
avis.groupby(['restaurant_id', 'annee'])['etoiles'].mean().std()
)

Grayrigel · Accepted Answer

Here is a potential solution with groupby:

#generating test data
dates = pd.date_range('20130101', periods=36, freq='M')
year = dates.strftime('%Y')
df = pd.DataFrame([np.random.randint(1,10) for x in range(36)],columns=['Rating'])
df['restaurants'] = ['R_{}'.format(i) for i in range(4)]*9
df['date'] = dates
df['year'] = year
print(df)

 rating restaurants  date    year
0   8   R_0     2013-01-31  2013
1   7   R_1     2013-02-28  2013
2   1   R_2     2013-03-31  2013
3   6   R_3     2013-04-30  2013
4   4   R_0     2013-05-31  2013
5   8   R_1     2013-06-30  2013
6   7   R_2     2013-07-31  2013
7   5   R_3     2013-08-31  2013
8   4   R_0     2013-09-30  2013
9   5   R_1     2013-10-31  2013
10  4   R_2     2013-11-30  2013
11  8   R_3     2013-12-31  2013
12  9   R_0     2014-01-31  2014
13  6   R_1     2014-02-28  2014
14  3   R_2     2014-03-31  2014
15  6   R_3     2014-04-30  2014
16  2   R_0     2014-05-31  2014
17  8   R_1     2014-06-30  2014
18  1   R_2     2014-07-31  2014
19  5   R_3     2014-08-31  2014
20  1   R_0     2014-09-30  2014
21  7   R_1     2014-10-31  2014
22  3   R_2     2014-11-30  2014
23  4   R_3     2014-12-31  2014
24  2   R_0     2015-01-31  2015
25  4   R_1     2015-02-28  2015
26  8   R_2     2015-03-31  2015
27  7   R_3     2015-04-30  2015
28  3   R_0     2015-05-31  2015
29  1   R_1     2015-06-30  2015
30  2   R_2     2015-07-31  2015
31  8   R_3     2015-08-31  2015
32  7   R_0     2015-09-30  2015
33  5   R_1     2015-10-31  2015
34  3   R_2     2015-11-30  2015
35  3   R_3     2015-12-31  2015

#df['date'] = pd.to_datetime(df['date']) #more versatile
#df.set_index('dates') #more versatile
#df.groupby([pd.Grouper(freq='1Y'),'restraunts'])['Rating'].mean() #more versatile

df = df.groupby(['year','restaurants']).agg({'Rating':[np.mean,np.std]})
print(df)

Output:

                  Rating   Rating
year  restaurants  mean      std
2013    R_0     5.333333    2.309401
        R_1     6.666667    1.527525
        R_2     4.000000    3.000000
        R_3     6.333333    1.527525
2014    R_0     4.000000    4.358899
        R_1     7.000000    1.000000
        R_2     2.333333    1.154701
        R_3     5.000000    1.000000
2015    R_0     4.000000    2.645751
        R_1     3.333333    2.081666
        R_2     4.333333    3.214550
        R_3     6.000000    2.645751

EDIT:

Renaming columns:

df.columns = ['Mean','STD']
df.reset_index(inplace=True)
    year   restaurant Mean     STD
0   2013    R_0     1.333333    0.577350
1   2013    R_1     5.333333    3.511885
2   2013    R_2     1.333333    0.577350
3   2013    R_3     4.333333    2.886751
4   2014    R_0     3.000000    1.000000
5   2014    R_1     3.666667    2.886751
6   2014    R_2     4.333333    4.041452
7   2014    R_3     5.333333    2.081666
8   2015    R_0     6.000000    2.645751
9   2015    R_1     6.333333    3.785939
10  2015    R_2     6.333333    3.785939
11  2015    R_3     5.666667    3.055050

Pandas: How to calculate the average of a groupby [duplicate]

Tags:

python

pandas

jupyter-notebook

feature-engineering

Lynn

1 Answers

Grayrigel

Recent Activity

Donate For Us

Pandas: How to calculate the average of a groupby [duplicate]

Tags:

python

pandas

jupyter-notebook

feature-engineering

Lynn

1 Answers

Grayrigel

Related questions

Recent Activity

Donate For Us