I was looking to add subtotals to a pandas dataframe - a question which I found to be asked here often. The answers making use of the deprecated pd.append aren't relevant anymore so I figured a more up-to-date version could be useful, not only for me, but for others as well.
The problem: Have a dataframe of the form (for example) -
df = pd.DataFrame(
{"A": ["A1", "A1", "A1", "A2", "A2", "A2", "A3", "A3", "A3"],
"B": ["B1", "B1", "B1", "B2", "B2", "B2", "B3", "B3", "B3"],
"C": ["C1", "C1", "C2", "C2", "C2", "C3", "C3", "C3", "C3"],
"D": [1,2,3,4,5,6,7,8,9]})
After a df.pivot_table(), add the subtotals for each hierarchical level (or just the highest level). Thanks for any tips, I didn't figure out a straightforward and general way to accomplish this.
EDIT:
I should probably add that I'm interested in the case of index being a list of 2 and more variables. The case for n = 1 is simple enough using margins.
The idea, after a df.pivot_table(columns="C", index = ["A", "B"], values = "D", fill_value=0, aggfunc="sum") , is to get something like
C C1 C2 C3 ....
A B
A1 B1 3 3 0
A1 B2 0 9 6
Totals 3 12 6
A2 B1 0 0 24
...
where the subtotals are over the A1 ... An levels and in the corresponding C1 ... Cn columns (for the general case), not the rows.
The sample output itself is
C C1 C2 C3
A B
A1 B1 3 3 0
A2 B2 0 9 6
A3 B3 0 0 24
Here is a short and readable solution:
subtotals = pt.groupby(level=0).sum().rename(index=lambda x: (x, 'Total'))
pt = pd.concat([pt, subtotals]).sort_index()
Explaination:
Group by the "a" index and sum it.
Rename the "b" index to total and store it in a helper table
Concat the helper table and the pivot table
Sort by the index
C C1 C2 C3
A1 B1 3 3 0
Total 3 3 0
A2 B2 0 9 6
Total 0 9 6
A3 B3 0 0 24
Total 0 0 24
Just as you wanted.
If you wanted to subtotal by 'B', then here is the modified code:
subtotals = pt.groupby(level=1).sum().rename(index=lambda x: ("Total", x))
pt = pd.concat([pt, subtotals]).sort_index(level=1)
and
C C1 C2 C3
A1 B1 3 3 0
Total B1 3 3 0
A2 B2 0 9 6
Total B2 0 9 6
A3 B3 0 0 24
Total B3 0 0 24
MWE (Minimum working example)
import pandas as pd
df = pd.DataFrame({
"A": ["A1","A1","A1","A2","A2","A2","A3","A3","A3"],
"B": ["B1","B1","B1","B2","B2","B2","B3","B3","B3"],
"C": ["C1","C1","C2","C2","C2","C3","C3","C3","C3"],
"D": [1,2,3,4,5,6,7,8,9]
})
pt = df.pivot_table(index=["A","B"], columns="C", values="D", aggfunc="sum", fill_value=0)
subtotals = pt.groupby(level=0).sum().rename(index=lambda x: (x, "Total"))
pt = pd.concat([pt, subtotals]).sort_index()
print(pt)
And output again:
C C1 C2 C3
A1 B1 3 3 0
Total 3 3 0
A2 B2 0 9 6
Total 0 9 6
A3 B3 0 0 24
Total 0 0 24
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With