Comparing a column from two dataframes and deleting rows in df2 that are within +/-0.03 of values in df1

Question

I have two dataframes:

                A            B
     df1<-   45.5219       5.3179   
              0.9670       4.2212

                A            B
     df2<-    1.0000       5.3178   
              0.1922       4.7881   
              0.0395       4.5975   
              0.0813       4.2215

For the B column values, there is an uncertainty value of +/-0.03, meaning that df1[5.3179] is roughly the same value as df2[5.3178], and df1[4.2212] is rougly the same value as df2[4.2215].

Therefore, I would like to delete the rows from df2 that have the roughly the same B values as in df1 (within an error value of +/-0.03), and produce a dataframe for df2 that looks like this:

                A            B
     df2<-    0.1922       4.7881   
              0.0395       4.5975

piRSquared · Accepted Answer

Using a combination of Numpy Broadcasting with numpy.isclose passing a tolerance of 0.03

v1 = df1.values
v2 = df2.values

df2[~np.isclose(v2, v1[:, None], .03).any(0).any(1)]

        A       B
1  0.1922  4.7881
2  0.0395  4.5975

Divakar · Answer

Well I know NumPy broadcasting, so here's one abusing it -

a = df1.values
b = df2.values
df2_out = df2[~(np.abs(a[:,None,:] - b[None]) <= 0.03).any(0).any(1)]

Sample run -

In [119]: a = df1.values

In [120]: b = df2.values

In [121]: df2[~(np.abs(a[:,None,:] - b[None]) <= 0.03).any(0).any(1)]
Out[121]: 
        A       B
1  0.1922  4.7881
2  0.0395  4.5975

If you were looking to compare against only B column in the two dataframes -

In [136]: df2[~(np.abs(df1.B.values[:,None] - df2.B.values) <= 0.03).any(0)]
Out[136]: 
        A       B
1  0.1922  4.7881
2  0.0395  4.5975

Comparing a column from two dataframes and deleting rows in df2 that are within +/-0.03 of values in df1

Tags:

python-3.x

pandas

numpy

Neko

2 Answers

piRSquared

Divakar

Recent Activity

Donate For Us

Comparing a column from two dataframes and deleting rows in df2 that are within +/-0.03 of values in df1

Tags:

python-3.x

pandas

numpy

Neko

2 Answers

piRSquared

Divakar

Related questions

Recent Activity

Donate For Us