Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Passing `skipna` argument to `agg`

I want to set skipna=False when I use the agg method on a DataFrame.

My DataFrame has many (dynamic) columns. I'm performing groupby and aggregating using agg, like

import pandas as pd
import numpy as np
df = pd.DataFrame({"A": [1, 2], "B": [np.nan, np.nan], "C": [0, 0]})

# the sum of B is 0.0
df.agg({"A": "sum", "B": "sum", "C": "max"})

When I'm aggregating a single column, or using a single aggregation function across the entire DataFrame, I can add skipna=False so that the nan values aren't skipped, i.e. df["B"].sum(skipna=False) or df.sum(skipna=False). This doesn't work for me because I'm doing a bunch of different functions (sum, avg, max).

How can I pass that skipna argument via the agg method?

like image 807
Kirk Broadhurst Avatar asked Oct 21 '25 05:10

Kirk Broadhurst


2 Answers

Personally I'd do:

out = pd.Series({'A': df['A'].sum(skipna=False), 
                 'B': df['B'].sum(skipna=False),
                 'C': df['C'].max()
                })

Also, agg with lambda would work as well:

df.agg({'A': lambda x: x.sum(skipna=False),
        'B': lambda x: x.sum(skipna=False),
        'C': 'max'})
like image 115
Quang Hoang Avatar answered Oct 23 '25 18:10

Quang Hoang


If you have lot of columns to aggregate here is another approach:

d = {"A": "sum", "B": "sum", "C": "max"}
pd.Series({c: getattr(df[c], f)(skipna=False) for c, f in d.items()})

A    3.0
B    NaN
C    0.0
dtype: float64
like image 35
Shubham Sharma Avatar answered Oct 23 '25 17:10

Shubham Sharma



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!