Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Resampling with pandas.MultiIndex: Resampler.aggregate() & Resampler[column]

I am trying to resample a data frame. First, I want to keep several aggregations in the result. Second, there is an additional aggregation in interest for a specific column. Since this aggregation is only relevant to a single column, the resampler can be limited to this column in order not to unnecessarily apply the aggregation to the other columns.

This scenario is working for a simple one-dimensional column index:

import numpy as np
import pandas as pd
df = pd.DataFrame(data=np.random.rand(50,4), index=pd.to_datetime(np.arange(0, 50), unit="s"), columns=["a", "b", "c", "d"])
r = df.resample("10s")
result = r.aggregate(["mean", "std"])
result[("d", "ffill")] = r["d"].ffill()
print(result)

However, as soon as I start to use multi-indexed columns problems appear. First, I can not keep several aggregations at once:

df.columns = pd.MultiIndex.from_product([("a", "b"), ("alpha", "beta")])
r = df.resample("10s")    # can be omitted
result = r.aggregate(["mean", "std"])
---> AttributeError: 'Series' object has no attribute 'columns'

Second, the resampler can not be limited to the relevant column anymore:

r[("b", "beta")].ffill()
--> KeyError: "Columns not found: 'b', 'beta'"

How can I transform my concern from simple indices to multi-indices?

like image 474
Chickenmarkus Avatar asked Oct 23 '25 10:10

Chickenmarkus


2 Answers

you can use pd.Grouper in a groupby instead of resample, such as:

result = df.groupby(pd.Grouper(freq='10s',level=0)).aggregate(["mean", "std"])
print (result)
                           a                                       b  \
                        alpha                beta               alpha   
                         mean       std      mean       std      mean   
1970-01-01 00:00:00  0.460569  0.312508  0.476511  0.260534  0.479577   
1970-01-01 00:00:10  0.441498  0.315277  0.487855  0.306068  0.535842   
1970-01-01 00:00:20  0.569884  0.248503  0.320552  0.288479  0.507755   
1970-01-01 00:00:30  0.478037  0.262654  0.552214  0.251581  0.505132   
1970-01-01 00:00:40  0.611227  0.328916  0.473773  0.241604  0.358298   


                                   beta            
                          std      mean       std  
1970-01-01 00:00:00  0.357493  0.448487  0.294432  
1970-01-01 00:00:10  0.259145  0.472250  0.320954  
1970-01-01 00:00:20  0.369490  0.432944  0.150473  
1970-01-01 00:00:30  0.298759  0.381614  0.248785  
1970-01-01 00:00:40  0.203831  0.381412  0.374965  

and for the second part, I'm not sure what you mean, but according to the result given in the case of the single column level, try this it gives a result

result[("b", "beta",'ffill')] = df.groupby(pd.Grouper(freq='10s',level=0))[[("b", "beta")]].first()
like image 197
Ben.T Avatar answered Oct 24 '25 23:10

Ben.T


It must be a bug in aggregate. A go-around would be stack:

(df.stack().groupby(level=-1)
  .apply(lambda x:x.resample('10s', level=0).aggregate(["mean", "std"]))
  .unstack(level=0)
)
like image 21
Quang Hoang Avatar answered Oct 25 '25 00:10

Quang Hoang



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!