Get cumulative mean among groups in Python

Question

I am trying to get a cumulative mean in python among different groups. I have data as follows:

id  date        value
1   2019-01-01  2
1   2019-01-02  8
1   2019-01-04  3
1   2019-01-08  4
1   2019-01-10  12
1   2019-01-13  6
2   2019-01-01  4
2   2019-01-03  2
2   2019-01-04  3
2   2019-01-06  6
2   2019-01-11  1

The output I'm trying to get something like this:

id  date        value   cumulative_avg
1   2019-01-01  2   NaN
1   2019-01-02  8   2
1   2019-01-04  3   5
1   2019-01-08  4   4.33
1   2019-01-10  12  4.25
1   2019-01-13  6   5.8
2   2019-01-01  4   NaN
2   2019-01-03  2   4
2   2019-01-04  3   3
2   2019-01-06  6   3
2   2019-01-11  1   3.75

I need the cumulative average to restart with each new id. I can get a variation of what I'm looking for with a single, for example if the data set only had the data where id = 1 then I could use:

df['cumulative_avg'] = df['value'].expanding.mean().shift(1)

I try to add a group by into it but I get an error:

df['cumulative_avg'] = df.groupby('id')['value'].expanding().mean().shift(1)

TypeError: incompatible index of inserted column with frame index

Also tried:

df.set_index(['account']
ValueError: cannot handle a non-unique multi-index!

The actual data I have has millions of rows, and thousands of unique ids'. Any help with a speedy/efficient way to do this would be appreciated.

ALollz · Accepted Answer

For many groups this will perform better because it ditches the apply. Take the cumsum divided by the cumcount, subtracting off the value to get the analog of expanding. Fortunately pandas interprets 0/0 as NaN.

gp = df.groupby('id')['value']
df['cum_avg'] = (gp.cumsum() - df['value'])/gp.cumcount()

    id        date  value   cum_avg
0    1  2019-01-01      2       NaN
1    1  2019-01-02      8  2.000000
2    1  2019-01-04      3  5.000000
3    1  2019-01-08      4  4.333333
4    1  2019-01-10     12  4.250000
5    1  2019-01-13      6  5.800000
6    2  2019-01-01      4       NaN
7    2  2019-01-03      2  4.000000
8    2  2019-01-04      3  3.000000
9    2  2019-01-06      6  3.000000
10   2  2019-01-11      1  3.750000

Get cumulative mean among groups in Python

Tags:

python

pandas

dataframe

Steveiepete

1 Answers

ALollz

Recent Activity

Donate For Us

Get cumulative mean among groups in Python

Tags:

python

pandas

dataframe

Steveiepete

1 Answers

ALollz

Related questions

Recent Activity

Donate For Us