Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Selecting Rows in Dataframe that have any column equal to any item in a list

Tags:

python

pandas

Let's say I have the following dataframe and I want to select any row that has any of it's values equal to any item in the list: CodesOfInterest=['A','D']

>>> import pandas as pd
>>> d1=pd.DataFrame([['A','B','C','D'],['D','Q','S', np.nan],['R',np.nan,np.nan,np.nan],[np.nan,'A',np.nan,np.nan]],columns=['Code1','Code2','Code3','Code4'])
>>> d1
  Code1 Code2 Code3 Code4
0     A     B     C     D
1     D     Q     S   NaN
2     R   NaN   NaN   NaN
3   NaN     A   NaN   NaN
>>> 

This can be done pretty easily with one line of code:

>>> CodesOfInterest=['A','D']
>>> d1[(d1.isin(CodesOfInterest)==True).any(1)]
  Code1 Code2 Code3 Code4
0     A     B     C     D
1     D     Q     S   NaN
3   NaN     A   NaN   NaN
>>> 

However say I have the following second dataframe indexed the same as the first that adds a condition to this subset.

>>> d2=pd.DataFrame([[1,0,1,0],[0,1,1, np.nan],[1,np.nan,np.nan,np.nan],[np.nan,1,np.nan,np.nan]],columns=['CodeStatus1','CodeStatus2','CodeStatus3','CodeStatus4'])
>>> d2
   CodeStatus1  CodeStatus2  CodeStatus3  CodeStatus4
0            1            0            1            0
1            0            1            1          NaN
2            1          NaN          NaN          NaN
3          NaN            1          NaN          NaN
>>> 

Now I want to only select rows from my d1 that have any of their values equal to any time in my list AND have their corresponding 'CodeStatus' (from d2) equal to 1. And by corresponding CodeStatus I mean pairs of (Code1, CodeStatus1), (Code2, CodeStatus2), etc.

I have a clunky way of doing this that requires looping through each of the 4 Codes and Code Statuses. See below:

>>> bs=[]    
>>> for Num in range(1,5):
...     Code='Code'+str(Num)
...     CodeStatus='CodeStatus'+str(Num)
...     b=(df[Code].isin(CodesOfInterest))&(df[CodeStatus]==1)
...     bs.append(b)
... 
>>> Matches=pd.concat(bs,1)
>>> 
>>> d1[(Matches==True).any(1)]
  Code1 Code2 Code3 Code4
0     A     B     C     D
3   NaN     A   NaN   NaN
>>> 

As you see, record 1 now gets dropped from the dataframe because although it has a column with code 'D', the Code Status for this code is not equal to 1.

Is there a more elegant way to make this query that doesn't require looping through each column?

like image 231
AJG519 Avatar asked Jan 23 '26 19:01

AJG519


1 Answers

You can achieve it this way:

d1[pd.DataFrame((d1.isin(CodesOfInterest)==True).values*(d2==1).values).any(1)]

like image 146
zuku Avatar answered Jan 26 '26 09:01

zuku



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!