Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unable to use lambda function in pandas NamedAgg function

Tags:

python

pandas

pandas 0.25 introduced a new function called NamedAgg to allow creating named fields on groupby object which is a very nice feature see(NamedAgg).

However, It seems I can't get it working with lambda functions. I don't know if this is a bug or by-design.

Setup:

df = pd.DataFrame({'kind': ['cat', 'dog', 'cat', 'dog'],
                         'height': [9.1, 6.0, 9.5, 34.0],
                         'weight': [7.9, 7.5, 9.9, 198.0]})

using lambda in a dict works fine. This is the old way.

(
    df.groupby(by='kind')
    .height.agg({'height_min':lambda x: np.min(x**2), 'height_max':'max'})
)

using lambda with the new NamedAgg function doesn't work

(
    df.groupby(by='kind')
    .agg(height_min=pd.NamedAgg(column='height', aggfunc=lambda x: np.min(x**2)), 
         height_max=pd.NamedAgg(column='height', aggfunc='max')
        )
)

using lambda with implicit NamedAgg function doesn't work either

(
    df.groupby(by='kind')
    .agg(height_min=('height', lambda x: np.min(x**2)), 
         height_max=('height', 'max')
        )
)

Can anyone explain why a lambda function doesn't work here?

like image 749
Allen Avatar asked Oct 18 '25 13:10

Allen


1 Answers

Here is one way to do this using 0.25 syntax with a single aggregration column:

df.groupby('kind')['height'].agg(height_min=lambda x: np.min(x**2),
                                 height_max='max')

Output:

      height_min  height_max
kind                        
cat        82.81         9.5
dog        36.00        34.0

However, I do think this is a bug.

like image 200
Scott Boston Avatar answered Oct 21 '25 03:10

Scott Boston