Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove rows of dataframe of values in a list present in another list

I'm attempting to remove the rows of values in a list within df which are present in lst.

I'm aware of using df[df[x].isin(y)] for singular strings but am not sure as to how to adjust the same method to work with lists within a dataframe.

lst = ['f','a']

df:

           Column1            Out1
0          ['x', 'y']         a
1          ['a', 'b']         i
2          ['c', 'd']         o
3          ['e', 'f']         u
etc.

I've attempted to use list comprehension but it doesn't seem to work the same with Pandas

df = df[[i for x in list for i in df['Column1']]]

Error:

TypeError: unhashable type: 'list'

My expected output would be as followed; removing the rows that contain the lists of which have the values in lst:

           Column1            Out1
0          ['x', 'y']         a
1          ['c', 'd']         o
etc.
like image 615
ThatOneNoob Avatar asked Nov 18 '25 12:11

ThatOneNoob


1 Answers

You can use convert values to sets and then use &, for inverting mask use ~:

df = pd.DataFrame({'Column1':[['x','y'], ['a','b'], ['c','d'],['e','f']],
                   'Out1':list('aiou')})

lst = ['f','a']
df1 = df[~(df['Column1'].apply(set) & set(lst))]
print (df1)
  Column1 Out1
0  [x, y]    a
2  [c, d]    o

Solution with nested list comprehension - get list of booleans, so need all for check if all values are True:

df1 =df[[all([x not in lst for x in i]) for i in df['Column1']]]
print (df1)
  Column1 Out1
0  [x, y]    a
2  [c, d]    o

print ([[x not in lst for x in i] for i in df['Column1']])
[[True, True], [False, True], [True, True], [True, False]]
like image 117
jezrael Avatar answered Nov 21 '25 01:11

jezrael