I have a dataframe like this:
right_answer rater1 rater2 rater3 item
1 1 1 2 S01
1 1 2 2 S02
2 1 2 1 S03
2 2 1 2 S04
and I need to get those rows or values in 'items' where at least two out of the three raters gave the wrong answer. I could already check if all the raters agree with each other with this code:
df.where(df[['rater1', 'rater2', 'rater3']].eq(df.iloc[:, 0], axis=0).all(1) == True)
I don't want to calculate a column with a majority voting because maybe I need to adjust the number of raters that have to agree or disagree wih the right answer.
Thanks for help
Use, DataFrame.filter to filter the dataframe containing columns like rater, then use DataFrame.ne along axis=0 to compare the columns containing rater with the column right_answer, then use DataFrame.sum along axis=1 to get number of raters who have given wrong answer, then use Series.ge to create a boolean mask, finally filter the dataframe rows using this mask:
mask = (
df.filter(like='rater')
.ne(df['right_answer'], axis=0).sum(axis=1).ge(2)
)
df = df[mask]
Result:
# print(df)
right_answer rater1 rater2 rater3 item
1 1 1 2 2 S02
2 2 1 2 1 S03
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With