Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas groupby result shape unexpected

I have a time-series data in "stacked" format and would like to compute a rolling function based on two columns. However, as shown in my example below, the groupby is concatenating my results horizontally instead of vertically. I can apply stack at the end to get back to tall format. However, I thought the correct behavior should be to concatenate vertically to allow assignment back to the original dataframe(something like x['res'] = df.groupby(...).apply(func)). Does anyone know why groupby is not behaving as expected or am I doing something wrong?

x
Out[52]: 
    group      month         a         b
0   18527 2014-09-01  0.534152  0.973451
1   18527 2014-10-01  0.079879  0.354498
2   18527 2014-11-01  0.032298  0.203997
3   18527 2014-12-01  0.148435  0.352703
4   18527 2015-01-01  0.879930  0.819328
5   18527 2015-02-01  0.475297  0.693203
6   18527 2015-03-01  0.223759  0.731594
7   18527 2015-04-01  0.391933  0.332801
8   18671 2014-09-01  0.740621  0.305298
9   18671 2014-10-01  0.230585  0.772569
10  18671 2014-11-01  0.664834  0.755219
11  18671 2014-12-01  0.987118  0.896310
12  18671 2015-01-01  0.228804  0.058641
13  18671 2015-02-01  0.415715  0.182683
14  18671 2015-03-01  0.574570  0.144686
15  18671 2015-04-01  0.488804  0.545102

x.dtypes
Out[53]: 
group             int64
month    datetime64[ns]
a               float64
b               float64
dtype: object

def func(s):
    return pd.rolling_sum(s.a, 3) / pd.rolling_sum(s.b, 3)


x.set_index('month').groupby('group').apply(func)
Out[55]: 
month  2014-09-01  2014-10-01  2014-11-01  2014-12-01  2015-01-01  2015-02-01  group                                                                           
18527         NaN         NaN    0.421900    0.286010    0.770814    0.806152   
18671         NaN         NaN    0.892505    0.776593    1.099748    1.434238   

month  2015-03-01  2015-04-01  
group                          
18527    0.703609    0.620728  
18671    3.158185    1.695287  

x.set_index('month').groupby('group').apply(func).stack()
Out[56]: 
group  month     
18527  2014-11-01    0.421900
       2014-12-01    0.286010
       2015-01-01    0.770814
       2015-02-01    0.806152
       2015-03-01    0.703609
       2015-04-01    0.620728
18671  2014-11-01    0.892505
       2014-12-01    0.776593
       2015-01-01    1.099748
       2015-02-01    1.434238
       2015-03-01    3.158185
       2015-04-01    1.695287
dtype: float64

1 Answers

You can convert the result to dataframe in func():

def func(s):
    return (pd.rolling_sum(s.a, 3) / pd.rolling_sum(s.b, 3)).dropna().to_frame()

df.groupby('group').apply(func)
like image 169
HYRY Avatar answered Dec 30 '25 14:12

HYRY



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!