Resampling with pandas.MultiIndex: Resampler.aggregate() & Resampler[column]

Question

I am trying to resample a data frame. First, I want to keep several aggregations in the result. Second, there is an additional aggregation in interest for a specific column. Since this aggregation is only relevant to a single column, the resampler can be limited to this column in order not to unnecessarily apply the aggregation to the other columns.

This scenario is working for a simple one-dimensional column index:

import numpy as np
import pandas as pd
df = pd.DataFrame(data=np.random.rand(50,4), index=pd.to_datetime(np.arange(0, 50), unit="s"), columns=["a", "b", "c", "d"])
r = df.resample("10s")
result = r.aggregate(["mean", "std"])
result[("d", "ffill")] = r["d"].ffill()
print(result)

However, as soon as I start to use multi-indexed columns problems appear. First, I can not keep several aggregations at once:

df.columns = pd.MultiIndex.from_product([("a", "b"), ("alpha", "beta")])
r = df.resample("10s")    # can be omitted
result = r.aggregate(["mean", "std"])
---> AttributeError: 'Series' object has no attribute 'columns'

Second, the resampler can not be limited to the relevant column anymore:

r[("b", "beta")].ffill()
--> KeyError: "Columns not found: 'b', 'beta'"

How can I transform my concern from simple indices to multi-indices?

Ben.T · Accepted Answer

you can use pd.Grouper in a groupby instead of resample, such as:

result = df.groupby(pd.Grouper(freq='10s',level=0)).aggregate(["mean", "std"])
print (result)
                           a                                       b  \
                        alpha                beta               alpha   
                         mean       std      mean       std      mean   
1970-01-01 00:00:00  0.460569  0.312508  0.476511  0.260534  0.479577   
1970-01-01 00:00:10  0.441498  0.315277  0.487855  0.306068  0.535842   
1970-01-01 00:00:20  0.569884  0.248503  0.320552  0.288479  0.507755   
1970-01-01 00:00:30  0.478037  0.262654  0.552214  0.251581  0.505132   
1970-01-01 00:00:40  0.611227  0.328916  0.473773  0.241604  0.358298   


                                   beta            
                          std      mean       std  
1970-01-01 00:00:00  0.357493  0.448487  0.294432  
1970-01-01 00:00:10  0.259145  0.472250  0.320954  
1970-01-01 00:00:20  0.369490  0.432944  0.150473  
1970-01-01 00:00:30  0.298759  0.381614  0.248785  
1970-01-01 00:00:40  0.203831  0.381412  0.374965

and for the second part, I'm not sure what you mean, but according to the result given in the case of the single column level, try this it gives a result

result[("b", "beta",'ffill')] = df.groupby(pd.Grouper(freq='10s',level=0))[[("b", "beta")]].first()

Quang Hoang · Answer

It must be a bug in aggregate. A go-around would be stack:

(df.stack().groupby(level=-1)
  .apply(lambda x:x.resample('10s', level=0).aggregate(["mean", "std"]))
  .unstack(level=0)
)

Resampling with pandas.MultiIndex: Resampler.aggregate() & Resampler[column]

Tags:

python

pandas

pandas-groupby

multi-index

Chickenmarkus

2 Answers

Ben.T

Quang Hoang

Recent Activity

Donate For Us

Resampling with pandas.MultiIndex: Resampler.aggregate() & Resampler[column]

Tags:

python

pandas

pandas-groupby

multi-index

Chickenmarkus

2 Answers

Ben.T

Quang Hoang

Related questions

Recent Activity

Donate For Us