Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sum duplicated rows on a multi-index pandas dataframe

Hello I'm having troubles dealing with Pandas. I'm trying to sum duplicated rows on a multiindex Dataframe. I tryed with df.groupby(level=[0,1]).sum() , also with df.stack().reset_index().groupby(['year', 'product']).sum() and some others, but I cannot get it to work. I'd also like to add every unique product for each given year and give them a 0 value if they weren't listed.

Example: dataframe with multi-index and 3 different products (A,B,C):

                  volume1    volume2
year   product
2010   A          10         12
       A          7          3
       B          7          7
2011   A          10         10
       B          7          6
       C          5          5

Expected output : if there are duplicated products for a given year then we sum them. If one of the products isnt listed for a year, we create a new row full of 0.

                  volume1     volume2
year   product
2010   A          17          15
       B          7           7
       C          0           0
2011   A          10          10
       B          7           6
       C          5           5

Any idea ? Thanks

like image 803
CDVC23 Avatar asked Oct 15 '25 16:10

CDVC23


2 Answers

Use sum with unstack and stack:

df = df.sum(level=[0,1]).unstack(fill_value=0).stack()
#same as
#df = df.groupby(level=[0,1]).sum().unstack(fill_value=0).stack()

Alternative with reindex:

df = df.sum(level=[0,1])
#same as
#df = df.groupby(level=[0,1]).sum()
mux = pd.MultiIndex.from_product(df.index.levels, names = df.index.names)
df = df.reindex(mux, fill_value=0)

Alternative1, thanks @Wen:

df = df.sum(level=[0,1]).unstack().stack(dropna=False) 

print (df)
              volume1  volume2
year product                  
2010 A             17       15
     B              7        7
     C              0        0
2011 A             10       10
     B              7        6
     C              5        5
like image 54
jezrael Avatar answered Oct 18 '25 08:10

jezrael


You can make the second level of the index a CategoricalIndex and when you use groupby it will include all of the categories.

df.index.set_levels(pd.CategoricalIndex(df.index.levels[1]), 1, inplace=True)
df.groupby(level=[0, 1]).sum().fillna(0, downcast='infer')

              volume1  volume2
year product                  
2010 A             17       15
     B              7        7
     C              0        0
2011 A             10       10
     B              7        6
     C              5        5
like image 23
piRSquared Avatar answered Oct 18 '25 06:10

piRSquared



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!