Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas drop rows not in labels

I have two Pandas DataFrames (A and B) with overlapping indexes. I want to drop the rows in DataFrame B for which that index value doesn't exist in DataFrame A.

I have looked into Pandas drop method for DataFrames but it drops the columns with the given labels, I would like to drop the columns for which there are not given labels.

For now I managed to do this with:

B.drop(B.drop(A.index).index)

But this is clearly not the best way to do it (not efficient or readable). Is there a better way?

Example:

DataFrame A:

   index       col1  
     1       some_data    
     2       some_data    
     3       some_data   
     4       some_data   

DataFrame B:

   index       col2  
     1       other_data    
     2       other_data    
     3       other_data   
     4       other_data   
     5       other_data   
     6       other_data

I want to obtain DataFrame B':

   index       col2  
     1       other_data    
     2       other_data    
     3       other_data   
     4       other_data   
like image 887
João Almeida Avatar asked Oct 20 '25 04:10

João Almeida


2 Answers

I think you can use isin with boolean indexing:

print B.index.isin(A.index)
[ True  True  True  True False False]

print B[B.index.isin(A.index)]
             col2
index            
1      other_data
2      other_data
3      other_data
4      other_data
like image 166
jezrael Avatar answered Oct 22 '25 20:10

jezrael


You can use difference to find the row labels that are not in the other df index:

In [6]:
df2.drop(df2.index.difference(df1.index))

Out[6]:
             col2
index            
1      other_data
2      other_data
3      other_data
4      other_data
like image 24
EdChum Avatar answered Oct 22 '25 20:10

EdChum