Row containing minimum value of difference between two pandas columns - without groupby

Question

In the sample data generated by...

np.random.seed(2020)
sz = 20
df = pd.DataFrame(np.random.randn(sz, 2), index=range(sz), columns=list('AB'))
df.insert(0, 'Item', 'X')

...to get the row containing the minimum value of difference this works correctly:

df.iloc[df.groupby('Item').apply(lambda x: abs(x['A'] - x['B']).idxmin())]

However, removing groupby using to get the corresponding df.iloc:

df.apply(lambda x: abs(x.A - x.B).idxmin()) throws the error AttributeError: 'Series' object has no attribute 'A'
df.apply(lambda x: abs(x['A']- x['B']).idxmin()) throws another error-type KeyError: 'A' !!!

Why is this happening?

What is the correct code to get the minimum value of difference, without using groupby?

John Sloper · Accepted Answer

When doing the apply you have to supply the axis it should loop over. By default it will go column by column, hence the messages you are getting.

The following will work: df.apply(lambda x: abs(x['A']- x['B']), axis=1).idxmin()

However, there is not need to use the lambda, you can do: abs(df['A']- df['B']).idxmin() or (df['A']- df['B']).abs().idxmin()

Which both are also much faster than using apply.

Donate For Us