I have a dataframe of meaning housing prices by month that looks like this
RegionName 2000-01 2000-02 2000-03
New York 200000 210000 220000
Austin 100000 110000 130000 ...
Los Angeles 180000 190000 200000
I have a list of lists of months corresponding to quarters and a list of quarters that look like
month_chunks = [['2000-01', '2000-02', '2000-03'], ['2000-04', '2000-05', '2000-06']...]
quarters = ['2000q1', '2000q2', '2000q3'...]
I'm trying to create columns in my dataframe that contain mean prices by quarter
for quarter, chunk in zip(quarters, month_chunks):
housing[quarter] = np.mean(housing[chunk].mean())
RegionName 2000-01 2000-02 2000-03 2000q1
New York 200000 210000 220000 210000
Austin 100000 110000 130000 ... 113333.333
Los Angeles 180000 190000 200000 190000
But it is giving me columns that are duplicated for each row
RegionName 2000-01 2000-02 2000-03 2000q1
New York 200000 210000 220000 210000
Austin 100000 110000 130000 ... 210000
Los Angeles 180000 190000 200000 210000
The dataframe is large, so iterating through it and the lists is not doable
for i, row in housing.iterrows():
for quarter, chunk in zip(quarters, month_chunks):
row[quarter].iloc[i] = np.mean(row[chunk].iloc[i].mean())
Don't iterrows, you can perform your operation columns wise:
for months, qt in zip(month_chunks, quarters):
housing[qt] = housing[months].mean(axis=1)
Here is one way using groupby
from collections import ChainMap
d=dict(ChainMap(*[dict.fromkeys(x,y)for x , y in zip(month_chunks,quarters)]))
s=housing.set_index('RegionName').groupby(d,axis=1).mean()
s
Out[32]:
2000q1
RegionName
NewYork 210000.000000
Austin 113333.333333
LosAngeles 190000.000000
df=pd.concat([housing.set_index('RegionName'),s],axis=1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With