I have a large data frame with Origin, Destination and other data points. How do I efficiently select rows that have the Origin, Destination pairs of interest, i.e. I have another data frame with the pairs of interest.
df = pd.DataFrame({'Origin' : ['A','A','A','B','B','A'],
'Destination' :['B','C','D','C','D','D'],
'X' : ['a','b', 'c', 'd', 'e', 'f']})
I = pd.DataFrame({'Origin' : ['A','A','B'], 'Destination' : ['B','D','C']})
And I would want to return back
Origin Destination X
0 A B a
1 A D c
2 B C d
3 A D f
Use merge with default inner join and if only same joined columns in both DataFrames parameter on can be omit to:
print (pd.merge(df,I))
Destination Origin X
0 B A a
1 D A c
2 D A f
3 C B d
If need reorder columns:
print (pd.merge(I,df)[['Origin','Destination','X']])
Origin Destination X
0 A B a
1 A D c
2 A D f
3 B C d
If more columns with same names add on:
print (pd.merge(I,df, on=['Origin','Destination'])[['Origin','Destination','X']])
Origin Destination X
0 A B a
1 A D c
2 A D f
3 B C d
And thank you Boud for suggestion:
print (df.merge(I))
Destination Origin X
0 B A a
1 D A c
2 D A f
3 C B d
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With