In the sample data generated by...
np.random.seed(2020)
sz = 20
df = pd.DataFrame(np.random.randn(sz, 2), index=range(sz), columns=list('AB'))
df.insert(0, 'Item', 'X')
...to get the row containing the minimum value of difference this works correctly:
df.iloc[df.groupby('Item').apply(lambda x: abs(x['A'] - x['B']).idxmin())]
However, removing groupby using to get the corresponding df.iloc:
df.apply(lambda x: abs(x.A - x.B).idxmin()) throws the error AttributeError: 'Series' object has no attribute 'A'
df.apply(lambda x: abs(x['A']- x['B']).idxmin()) throws another error-type KeyError: 'A' !!!
Why is this happening?
What is the correct code to get the minimum value of difference, without using groupby?
When doing the apply you have to supply the axis it should loop over. By default it will go column by column, hence the messages you are getting.
The following will work:
df.apply(lambda x: abs(x['A']- x['B']), axis=1).idxmin()
However, there is not need to use the lambda, you can do:
abs(df['A']- df['B']).idxmin()
or
(df['A']- df['B']).abs().idxmin()
Which both are also much faster than using apply.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With