Pandas DataFrame custom agg function strange behavior

Question

I'd like to aggregate a Pandas DataFrame along an axis using a custom function, and I'm having trouble figuring out what the function should return.

df = pd.DataFrame(np.arange(50).reshape(10,5))

You can pass numpy functions to DataFrame.agg:

# Case 1
df.agg([np.mean], axis=1)

And you get what you expect: a DataFrame indexed just like df, but with one column: 'mean'. But for some reason, the following behave completely differently:

# Case 2
df.agg([lambda x:np.mean(x)], axis=1)

or even

# Case 3
def f(x, **kwargs):
    return np.mean(x, **kwargs)

df.agg([f], axis=1)

Why should the latter two work any differently than the first case?

Derek O · Accepted Answer

If I am not mistaken, what is happening in Case 2 is that the np.mean() operation is flattening the array first, so the mean of each column of each row entry is being calculated, which is why you get the mean of every single entry in the DataFrame when you run df.agg([lambda x:np.mean(x)], axis=1) which returns:

               0     1     2     3     4
0 <lambda>   0.0   1.0   2.0   3.0   4.0
1 <lambda>   5.0   6.0   7.0   8.0   9.0
2 <lambda>  10.0  11.0  12.0  13.0  14.0
3 <lambda>  15.0  16.0  17.0  18.0  19.0
4 <lambda>  20.0  21.0  22.0  23.0  24.0
5 <lambda>  25.0  26.0  27.0  28.0  29.0
6 <lambda>  30.0  31.0  32.0  33.0  34.0
7 <lambda>  35.0  36.0  37.0  38.0  39.0
8 <lambda>  40.0  41.0  42.0  43.0  44.0
9 <lambda>  45.0  46.0  47.0  48.0  49.0

There's a specific point about how numpy aggregation functions are different than pandas aggregation operations in the pandas documentation on the aggregation function.

To make Case 2 behave as Case 1 does, you can specify the axis in the np.mean() function itself: df.agg([lambda x:np.mean(x,axis=0)],axis=1), which returns the following:

   <lambda>
0       2.0
1       7.0
2      12.0
3      17.0
4      22.0
5      27.0
6      32.0
7      37.0
8      42.0
9      47.0

Similarly, you can make Case 3 behave as Case 1 does by specifying axis=0 in the np.mean() function:

def f(x, **kwargs):
    return np.mean(x, axis=0, **kwargs)

df.agg([f], axis=1)

And this returns:

Pandas DataFrame custom agg function strange behavior

Tags:

pandas

aggregation

nvd81

1 Answers

Derek O

Recent Activity

Donate For Us

Pandas DataFrame custom agg function strange behavior

Tags:

pandas

aggregation

nvd81

1 Answers

Derek O

Related questions

Recent Activity

Donate For Us