In this case, mostly means less than 5 elements are non-zero in a column. Matrix is a 2d ndarray.
Sample data:
a = np.array([[1,1,2,1,1],
[1,1,0,1,0],
[1,1,0,1,0],
[1,1,0,3,0],
[1,1,0,3,0],
[1,1,1,5,3],
[1,1,0,1,0],
[1,1,0,1,0],
[1,1,4,3,0],
[1,1,0,4,0],
[1,1,0,5,0],
[1,1,0,0,0]])
Output
a = np.array([[1,1,1],
[1,1,1],
[1,1,1],
[1,1,3],
[1,1,3],
[1,1,5],
[1,1,1],
[1,1,1],
[1,1,3],
[1,1,4],
[1,1,5],
[1,1,0]])
How about:
>>> a[:, (a != 0).sum(axis=0) >= 5]
array([[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[1, 1, 3],
[1, 1, 3],
[1, 1, 5],
[1, 1, 1],
[1, 1, 1],
[1, 1, 3],
[1, 1, 4],
[1, 1, 5],
[1, 1, 0]])
or
>>> a[:, np.apply_along_axis(np.count_nonzero, 0, a) >= 5]
array([[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[1, 1, 3],
[1, 1, 3],
[1, 1, 5],
[1, 1, 1],
[1, 1, 1],
[1, 1, 3],
[1, 1, 4],
[1, 1, 5],
[1, 1, 0]])
In the past I've found np.count_nonzero to be much faster than the sum trick, but here -- probably because of the need to use np.appyly_along_axis -- that version is instead much slower, at least for this a. Some other tests showed the same even for larger matrices, but YMMV.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With