I'm comparing two columns in a dataframe (A & B). I have a method that works (C5). It came from this question: Compare two columns using pandas
I wondered why I couldn't get the other methods (C1 - C4) to give the correct answer:
df = pd.DataFrame({'A': [1,1,1,1,1,2,2,2,2,2],
'B': [1,1,1,1,1,1,0,0,0,0]})
#df['C1'] = 1 [df['A'] == df['B']]
df['C2'] = df['A'].equals(df['B'])
df['C3'] = np.where((df['A'] == df['B']),0,1)
def fun(row):
if ['A'] == ['B']:
return 1
else:
return 0
df['C4'] = df.apply(fun, axis=1)
df['C5'] = df.apply(lambda x : 1 if x['A'] == x['B'] else 0, axis=1)

Use:
df = pd.DataFrame({'A': [1,1,1,1,1,2,2,2,2,2],
'B': [1,1,1,1,1,1,0,0,0,0]})
So for C1 and C2 need compare columns by == or eq for boolean mask and then convert it to integers - True, False to 1,0:
df['C1'] = (df['A'] == df['B']).astype(int)
df['C2'] = df['A'].eq(df['B']).astype(int)
Here is necessary change order 1,0 - for match condition need 1:
df['C3'] = np.where((df['A'] == df['B']),1,0)
In function is not selected values of Series, missing row:
def fun(row):
if row['A'] == row['B']:
return 1
else:
return 0
df['C4'] = df.apply(fun, axis=1)
Solution is correct:
df['C5'] = df.apply(lambda x : 1 if x['A'] == x['B'] else 0, axis=1)
print (df)
A B C1 C2 C3 C4 C5
0 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
2 1 1 1 1 1 1 1
3 1 1 1 1 1 1 1
4 1 1 1 1 1 1 1
5 2 1 0 0 0 0 0
6 2 0 0 0 0 0 0
7 2 0 0 0 0 0 0
8 2 0 0 0 0 0 0
9 2 0 0 0 0 0 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With