I have two dataframes:
A B
df1<- 45.5219 5.3179
0.9670 4.2212
A B
df2<- 1.0000 5.3178
0.1922 4.7881
0.0395 4.5975
0.0813 4.2215
For the B column values, there is an uncertainty value of +/-0.03, meaning that df1[5.3179] is roughly the same value as df2[5.3178], and df1[4.2212] is rougly the same value as df2[4.2215].
Therefore, I would like to delete the rows from df2 that have the roughly the same B values as in df1 (within an error value of +/-0.03), and produce a dataframe for df2 that looks like this:
A B
df2<- 0.1922 4.7881
0.0395 4.5975
Using a combination of Numpy Broadcasting with numpy.isclose passing a tolerance of 0.03
v1 = df1.values
v2 = df2.values
df2[~np.isclose(v2, v1[:, None], .03).any(0).any(1)]
A B
1 0.1922 4.7881
2 0.0395 4.5975
Well I know NumPy broadcasting, so here's one abusing it -
a = df1.values
b = df2.values
df2_out = df2[~(np.abs(a[:,None,:] - b[None]) <= 0.03).any(0).any(1)]
Sample run -
In [119]: a = df1.values
In [120]: b = df2.values
In [121]: df2[~(np.abs(a[:,None,:] - b[None]) <= 0.03).any(0).any(1)]
Out[121]:
A B
1 0.1922 4.7881
2 0.0395 4.5975
If you were looking to compare against only B column in the two dataframes -
In [136]: df2[~(np.abs(df1.B.values[:,None] - df2.B.values) <= 0.03).any(0)]
Out[136]:
A B
1 0.1922 4.7881
2 0.0395 4.5975
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With