Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Assigning values to a dataframe by iterating through two lists

Tags:

python

pandas

I have a dataframe of meaning housing prices by month that looks like this

RegionName    2000-01    2000-02    2000-03  

New York      200000     210000     220000
Austin        100000     110000     130000  ...
Los Angeles   180000     190000     200000

I have a list of lists of months corresponding to quarters and a list of quarters that look like

month_chunks = [['2000-01', '2000-02', '2000-03'], ['2000-04', '2000-05', '2000-06']...]

quarters = ['2000q1', '2000q2', '2000q3'...]

I'm trying to create columns in my dataframe that contain mean prices by quarter

for quarter, chunk in zip(quarters, month_chunks):
        housing[quarter] = np.mean(housing[chunk].mean())

RegionName    2000-01    2000-02    2000-03       2000q1   

New York      200000     210000     220000        210000    
Austin        100000     110000     130000  ...   113333.333 
Los Angeles   180000     190000     200000        190000

But it is giving me columns that are duplicated for each row

RegionName    2000-01    2000-02    2000-03       2000q1   

New York      200000     210000     220000        210000    
Austin        100000     110000     130000  ...   210000 
Los Angeles   180000     190000     200000        210000

The dataframe is large, so iterating through it and the lists is not doable

for i, row in housing.iterrows():
    for quarter, chunk in zip(quarters, month_chunks):
        row[quarter].iloc[i] = np.mean(row[chunk].iloc[i].mean())
like image 265
mkh Avatar asked Nov 30 '25 02:11

mkh


2 Answers

Don't iterrows, you can perform your operation columns wise:

for months, qt in zip(month_chunks, quarters):
    housing[qt] = housing[months].mean(axis=1)
like image 191
Quang Hoang Avatar answered Dec 09 '25 00:12

Quang Hoang


Here is one way using groupby

from collections import ChainMap
d=dict(ChainMap(*[dict.fromkeys(x,y)for x , y in zip(month_chunks,quarters)]))
s=housing.set_index('RegionName').groupby(d,axis=1).mean()
s
Out[32]: 
                   2000q1
RegionName               
NewYork     210000.000000
Austin      113333.333333
LosAngeles  190000.000000

df=pd.concat([housing.set_index('RegionName'),s],axis=1)
like image 30
BENY Avatar answered Dec 08 '25 22:12

BENY



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!