Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding subtotals to a pandas dataframe

I was looking to add subtotals to a pandas dataframe - a question which I found to be asked here often. The answers making use of the deprecated pd.append aren't relevant anymore so I figured a more up-to-date version could be useful, not only for me, but for others as well.

The problem: Have a dataframe of the form (for example) -

df = pd.DataFrame(
    {"A": ["A1", "A1", "A1", "A2", "A2", "A2", "A3", "A3", "A3"],
    "B": ["B1", "B1", "B1", "B2", "B2", "B2", "B3", "B3", "B3"],
    "C": ["C1", "C1", "C2", "C2", "C2", "C3", "C3", "C3", "C3"],
    "D": [1,2,3,4,5,6,7,8,9]})

After a df.pivot_table(), add the subtotals for each hierarchical level (or just the highest level). Thanks for any tips, I didn't figure out a straightforward and general way to accomplish this.

EDIT: I should probably add that I'm interested in the case of index being a list of 2 and more variables. The case for n = 1 is simple enough using margins.

The idea, after a df.pivot_table(columns="C", index = ["A", "B"], values = "D", fill_value=0, aggfunc="sum") , is to get something like

C       C1  C2  C3 ....
A  B              
A1 B1    3   3   0
A1 B2    0   9   6
Totals   3   12  6
A2 B1    0   0  24
...

where the subtotals are over the A1 ... An levels and in the corresponding C1 ... Cn columns (for the general case), not the rows.

The sample output itself is

C      C1  C2  C3
A  B             
A1 B1   3   3   0
A2 B2   0   9   6
A3 B3   0   0  24
like image 424
Thomas Petit Avatar asked Oct 24 '25 14:10

Thomas Petit


1 Answers

Here is a short and readable solution:

subtotals = pt.groupby(level=0).sum().rename(index=lambda x: (x, 'Total'))
pt = pd.concat([pt, subtotals]).sort_index()

Explaination:

  1. Group by the "a" index and sum it.

  2. Rename the "b" index to total and store it in a helper table

  3. Concat the helper table and the pivot table

  4. Sort by the index

C         C1  C2  C3
A1 B1      3   3   0
   Total   3   3   0
A2 B2      0   9   6
   Total   0   9   6
A3 B3      0   0  24
   Total   0   0  24

Just as you wanted.

If you wanted to subtotal by 'B', then here is the modified code:

subtotals = pt.groupby(level=1).sum().rename(index=lambda x: ("Total", x))
pt = pd.concat([pt, subtotals]).sort_index(level=1)

and

C         C1  C2  C3
A1    B1   3   3   0
Total B1   3   3   0
A2    B2   0   9   6
Total B2   0   9   6
A3    B3   0   0  24
Total B3   0   0  24

MWE (Minimum working example)

import pandas as pd

df = pd.DataFrame({
    "A": ["A1","A1","A1","A2","A2","A2","A3","A3","A3"],
    "B": ["B1","B1","B1","B2","B2","B2","B3","B3","B3"],
    "C": ["C1","C1","C2","C2","C2","C3","C3","C3","C3"],
    "D": [1,2,3,4,5,6,7,8,9]
})

pt = df.pivot_table(index=["A","B"], columns="C", values="D", aggfunc="sum", fill_value=0)

subtotals = pt.groupby(level=0).sum().rename(index=lambda x: (x, "Total"))
pt = pd.concat([pt, subtotals]).sort_index()

print(pt)

And output again:

C         C1  C2  C3
A1 B1      3   3   0
   Total   3   3   0
A2 B2      0   9   6
   Total   0   9   6
A3 B3      0   0  24
   Total   0   0  24
like image 190
Aadvik Avatar answered Oct 27 '25 02:10

Aadvik



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!