I'm trying the following
df = pd.DataFrame({'col1': [1,2,3], 'col2': [[1,2], [3,4], [3,2]]})
df
col1 col2
0 1 [1, 2]
1 2 [3, 4]
2 3 [3, 2]
and I would like to select rows where col1 value is in col2 list
df[df.col1.isin(df.col2)]
Empty DataFrame
Columns: [col1, col2]
Index: []
But I get the empty df. Why does the isin function doesn't work?
Why does the
isinfunction doesn't work?
Series.isin takes a set or list. When you call:
df.col1.isin(some_set_or_list)
col1 is in the entire some_set_or_list, which in this case is [[1,2],[3,4],[3,2]].col1[0] is in some_set_or_list[0], if col1[1] is in some_set_or_list[1], etc. In fact some_set_or_list can be a totally different length than col1.For example if col1's first value were [3,4] instead:
df = pd.DataFrame({'col1': [[3,4],2,3], 'col2': [[1,2], [3,4], [3,2]]})
# col1 col2
# 0 [3, 4] [1, 2]
# 1 2 [3, 4]
# 2 3 [3, 2]
isin would give you True for the first row since [3,4] is somewhere in col2 as a whole (not element-wise):
df.col1.isin(df.col2)
# 0 True
# 1 False
# 2 False
# Name: col1, dtype: bool
What you're actually trying to do is a row-wise test, which is what Rob's answer does:
df.apply(lambda row: row.col1 in row.col2, axis=1)
Simple case of apply(axis=1) with required condition to return a boolean that can then be used as a mask in loc[]
import pandas as pd
df = pd.DataFrame({'col1': [1,2,3], 'col2': [[1,2], [3,4], [3,2]]})
df.loc[df.apply(lambda r: r["col1"] in r["col2"], axis=1)]
| col1 | col2 | |
|---|---|---|
| 0 | 1 | [1, 2] |
| 2 | 3 | [3, 2] |
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With