Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter rows in numpy array based on second array

Tags:

python

numpy

I have 2 2d numpy arrays A and B I want to remove all the rows in A which appear in B.

I tried something like this:

A[~np.isin(A, B)]

but isin keeps the dimensions of A, I need one boolean value per row to filter it.

EDIT: something like this

A = np.array([[3, 0, 4],
              [3, 1, 1],
              [0, 5, 9]])
B = np.array([[1, 1, 1],
              [3, 1, 1]])

.....

A = np.array([[3, 0, 4],
              [0, 5, 9]])
like image 414
user2505961 Avatar asked Nov 21 '25 12:11

user2505961


2 Answers

Probably not the most performant solution, but does exactly what you want. You can change the dtype of A and B to be a unit consisting of one row. You need to ensure that the arrays are contiguous first, e.g. with ascontiguousarray:

Av = np.ascontiguousarray(A).view(np.dtype([('', A.dtype, A.shape[1])])).ravel()
Bv = np.ascontiguousarray(B).view(Av.dtype).ravel()

Now you can apply np.isin directly:

>>> np.isin(Av, Bv)
array([False,  True, False])

According to the docs, invert=True is faster than negating the output of isin, so you can do

A[np.isin(Av, Bv, invert=True)]
like image 52
Mad Physicist Avatar answered Nov 23 '25 02:11

Mad Physicist


Try the following - it uses matrix multiplication for dimensionality reduction:

import numpy as np

A = np.array([[3, 0, 4],
              [3, 1, 1],
              [0, 5, 9]])
B = np.array([[1, 1, 1],
              [3, 1, 1]])

arr_max = np.maximum(A.max(0) + 1, B.max(0) + 1)
print (A[~np.isin(A.dot(arr_max), B.dot(arr_max))])

Output:

[[3 0 4]
 [0 5 9]]
like image 44
peru_45 Avatar answered Nov 23 '25 03:11

peru_45