I created the following dataframe:
availability = pd.DataFrame(propertyAvailableData).set_index("createdat")
monthly_availability = availability.fillna(value=0).groupby(pd.TimeGrouper(freq='M'))
This gives the following output
2015-08-18 2015-09-09 2015-09-10 2015-09-11 2015-09-12 \
createdat
2015-08-12 1.0 1.0 1.0 1.0 1.0
2015-08-17 0.0 0.0 0.0 0.0 0.0
2015-08-18 0.0 1.0 1.0 1.0 1.0
2015-08-18 0.0 0.0 0.0 0.0 0.0
2015-08-19 0.0 1.0 1.0 1.0 1.0
2015-09-03 0.0 1.0 1.0 1.0 1.0
2015-09-03 0.0 1.0 1.0 1.0 1.0
2015-09-07 0.0 0.0 0.0 0.0 0.0
2015-09-08 0.0 0.0 0.0 0.0 0.0
2015-09-11 0.0 0.0 0.0 0.0 0.0
I'm trying to get the averages per created at month by doing:
monthly_availability_mean = monthly_availability.mean()
However, here I get the following output:
2015-08-18 2015-09-09 2015-09-10 2015-09-11 2015-09-12 \
createdat
2015-08-31 0.111111 0.444444 0.666667 0.777778 0.777778
2015-09-30 0.000000 0.222222 0.222222 0.222222 0.222222
2015-10-31 0.000000 0.000000 0.000000 0.000000 0.000000
And when I hand check august I get:
1.0 + 0 + 0 + 0 + 0 / 5 = 0.2
How do I get the correct mean per month?
availability.resample('M').mean()
I just encountered the same issue and solved it with the following code
#load data daily
df = pd.read_csv('./name.csv')
#set Date as index
df.Date = pd.to_datetime(df.Date)
df_date = df.set_index('Date', inplace=False)
#get monthly mean
df_month = df_date.resample('M').mean()
#group months
df_monthly_mean = df_month.groupby(df_daily.index.month).mean()
How that this was helpful!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With