Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

For every row in Pandas dataframe determine if a column value exists in another column

Tags:

python

pandas

I have a pandas data frame like this:

df = pd.DataFrame({'category' : ['A', 'B', 'C', 'A'], 'category_pred' : [['A'], ['B','D'], ['A','B','C'], ['D']]})
print(df)

  category category_pred
0        A           [A]
1        B        [B, D]
2        C     [A, B, C]
3        A           [D]

I would like to have an output like this:

  category category_pred  count
0        A           [A]      1
1        B        [B, D]      1
2        C     [A, B, C]      1
3        A           [D]      0

That is, for every row, determine if the value in 'category' appears in 'category_pred'. Note that 'category_pred' can contain multiple values.

I can do a for-loop like this one, but it is really slow.

for i in df.index:
    if df.category[i] in df.category_pred[i]:
        df['count'][i] = 1

I am looking for an efficient way to do this operation. Thanks!

like image 368
user42361 Avatar asked Jan 31 '26 07:01

user42361


1 Answers

You can make use of the DataFrame's apply method.

df['count'] = df.apply(lambda x: 1 if x.category in x.category_pred else 0, axis = 1)

This will add the new column as you want

like image 90
Haleemur Ali Avatar answered Feb 02 '26 22:02

Haleemur Ali



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!