I am trying to emulate Excel's AVERAGEIFs function in Pandas on a date range, and so far have been unsuccessful. I understand that I need to use apply and groupby, but I obviously don't have the syntax correct as I receive this error:
TypeError: can only concatenate list (not "Timedelta") to list
I am using test data at the moment while I try to figure out the syntax, and the data is below:

For each 'Avg' column I am trying to return the average quantity for the previous 180 days as grouped by 'A' and 'B'. Therefore, I would expect the 'Avg' column in row 1 to be 1.5 ( (1+2)/2, while leaving out row 5 which is a match but is > 180 days prior).
Here is the code I have at present, which does not work: import pandas as pd
#Importing the dataset
df = pd.read_excel('Test.xlsx', sheet_name='Sheet1')
df = pd.concat([df, pd.DataFrame(columns=['Avg Qty'])], axis=1)
df['Avg Qty'] = df.apply(df.groupby([(['Date'] <= (['Date']+pd.Timedelta(-1,
unit='d')) >= (['Date']+pd.Timedelta(-180, unit='d'))), 'A', 'B']))['Qty'].mean()
print(df.head)
Any help would be greatly appreciated.
IIUC, I think you want something like this:
df['Avg Qty'] = (df.groupby([pd.Grouper(freq='180D', key='Date'),'A','B'])['Qty']
.transform('mean'))
Output:
Date A B Qty Cost Avg Qty
0 2017-12-11 Cancer Golf 1 100 1.5
1 2017-11-11 Cancer Golf 2 200 1.5
2 2017-11-11 Cardio Golf 2 300 2.0
3 2017-10-11 Cardio Baseball 3 600 3.0
4 2017-04-11 Cancer Golf 4 150 4.0
5 2016-01-01 Cancer Football 5 200 5.0
df = df.set_index('Date')
df.groupby(['A','B']).apply(lambda x: x.sort_index().rolling('180D')['Qty'].mean()).reset_index()\
.merge(df.reset_index(), on=['Date','A','B'], suffixes=('_avg',''))
Output:
A B Date Qty_avg Qty Cost
0 Cancer Football 2016-01-01 5.0 5 200
1 Cancer Golf 2017-04-11 4.0 4 150
2 Cancer Golf 2017-11-11 2.0 2 200
3 Cancer Golf 2017-12-11 1.5 1 100
4 Cardio Baseball 2017-10-11 3.0 3 600
5 Cardio Golf 2017-11-11 2.0 2 300
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With