Sum duplicated rows on a multi-index pandas dataframe

Question

Hello I'm having troubles dealing with Pandas. I'm trying to sum duplicated rows on a multiindex Dataframe. I tryed with df.groupby(level=[0,1]).sum() , also with df.stack().reset_index().groupby(['year', 'product']).sum() and some others, but I cannot get it to work. I'd also like to add every unique product for each given year and give them a 0 value if they weren't listed.

Example: dataframe with multi-index and 3 different products (A,B,C):

                  volume1    volume2
year   product
2010   A          10         12
       A          7          3
       B          7          7
2011   A          10         10
       B          7          6
       C          5          5

Expected output : if there are duplicated products for a given year then we sum them. If one of the products isnt listed for a year, we create a new row full of 0.

                  volume1     volume2
year   product
2010   A          17          15
       B          7           7
       C          0           0
2011   A          10          10
       B          7           6
       C          5           5

Any idea ? Thanks

jezrael · Accepted Answer

Use sum with unstack and stack:

df = df.sum(level=[0,1]).unstack(fill_value=0).stack()
#same as
#df = df.groupby(level=[0,1]).sum().unstack(fill_value=0).stack()

Alternative with reindex:

df = df.sum(level=[0,1])
#same as
#df = df.groupby(level=[0,1]).sum()
mux = pd.MultiIndex.from_product(df.index.levels, names = df.index.names)
df = df.reindex(mux, fill_value=0)

Alternative1, thanks @Wen:

df = df.sum(level=[0,1]).unstack().stack(dropna=False)

print (df)
              volume1  volume2
year product                  
2010 A             17       15
     B              7        7
     C              0        0
2011 A             10       10
     B              7        6
     C              5        5

piRSquared · Answer

You can make the second level of the index a CategoricalIndex and when you use groupby it will include all of the categories.

df.index.set_levels(pd.CategoricalIndex(df.index.levels[1]), 1, inplace=True)
df.groupby(level=[0, 1]).sum().fillna(0, downcast='infer')

              volume1  volume2
year product                  
2010 A             17       15
     B              7        7
     C              0        0
2011 A             10       10
     B              7        6
     C              5        5

Sum duplicated rows on a multi-index pandas dataframe

Tags:

python

pandas

dataframe

multi-index

CDVC23

2 Answers

jezrael

piRSquared

Recent Activity

Donate For Us

Sum duplicated rows on a multi-index pandas dataframe

Tags:

python

pandas

dataframe

multi-index

CDVC23

2 Answers

jezrael

piRSquared

Related questions

Recent Activity

Donate For Us