I have the following dataframe
df = pd.DataFrame({'Id':['1','2','3'],'List_Origin':[['A','B'],['B','C'],['A','B']]})
How could i only get the ids, that contain only a certain List_Origin, for example 'A','B'. Would appreciate if the solution avoided loops
Wanted end result
end_df = pd.DataFrame({'Id':['1','3'],'List_Origin':[['A','B'],['A','B']]})
You can use apply and check like below:
>>> df[df['List_Origin'].apply(lambda x: x==['A', 'B'] or x==['A,B'])]
Id List_Origin
0 1 [A,B]
2 3 [A, B]
Unfortunately, when using lists, you cannot vectorize. You must use a loop.
I am assuming first that you have ['A', 'B'] and not ['A,B'] in the first row:
end_df = df[[x==['A', 'B'] for x in df['List_Origin']]]
output:
Id List_Origin
0 1 [A, B]
2 3 [A, B]
If, really, you have a mix of ['A', 'B'] and ['A,B'], then use:
end_df = df[[','.join(x)=='A,B' for x in df['List_Origin']]]
output:
Id List_Origin
0 1 [A,B]
2 3 [A, B]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With