Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Row containing minimum value of difference between two pandas columns - without groupby

Tags:

pandas

In the sample data generated by...

np.random.seed(2020)
sz = 20
df = pd.DataFrame(np.random.randn(sz, 2), index=range(sz), columns=list('AB'))
df.insert(0, 'Item', 'X')

...to get the row containing the minimum value of difference this works correctly:

df.iloc[df.groupby('Item').apply(lambda x: abs(x['A'] - x['B']).idxmin())]

However, removing groupby using to get the corresponding df.iloc:

  1. df.apply(lambda x: abs(x.A - x.B).idxmin()) throws the error AttributeError: 'Series' object has no attribute 'A'

  2. df.apply(lambda x: abs(x['A']- x['B']).idxmin()) throws another error-type KeyError: 'A' !!!

Why is this happening?

What is the correct code to get the minimum value of difference, without using groupby?

like image 999
reservoirinvest Avatar asked Oct 28 '25 04:10

reservoirinvest


1 Answers

When doing the apply you have to supply the axis it should loop over. By default it will go column by column, hence the messages you are getting.

The following will work: df.apply(lambda x: abs(x['A']- x['B']), axis=1).idxmin()

However, there is not need to use the lambda, you can do: abs(df['A']- df['B']).idxmin() or (df['A']- df['B']).abs().idxmin()

Which both are also much faster than using apply.

like image 120
John Sloper Avatar answered Oct 31 '25 11:10

John Sloper



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!