I am fairly new to numpy and scientific computing and I struggle with a problem for several days, so I decided to post it here.
I am trying to get a count for a specific occurence of a condition in a numpy array.
In [233]: import numpy as np
In [234]: a= np.random.random([5,5])
In [235]: a >.7
Out[235]: array([[False, True, True, False, False],
[ True, False, False, False, True],
[ True, False, True, True, False],
[False, False, False, False, False],
[False, False, True, False, False]], dtype=bool)
What I would like to count the number of occurence of True
in each row and keep the rows when this count reach a certain threshold:
ex :
results=[]
threshold = 2
for i,row in enumerate(a>.7):
if len([value for value in row if value==True]) > threshold:
results.append(i) # keep ids for each row that have more than 'threshold' times True
This is the non-optimized version of the code but I would love to achieve the same thing with numpy (I have a very large matrix to process).
I have been trying all sort of things with np.where
but I only can get flatten results. I need the row number
Thanks in advance !
To make results reproducible, use some seed:
>>> np.random.seed(100)
Then for a sample matrix
>>> a = np.random.random([5,5])
Count number of occurences along axis with sum:
>>> (a >.7).sum(axis=1)
array([1, 0, 3, 1, 2])
You can get row numbers with np.where
:
>>> np.where((a > .7).sum(axis=1) >= 2)
(array([2, 4]),)
To filter result, just use boolean indexing:
>>> a[(a > .7).sum(axis=1) >= 2]
array([[ 0.89041156, 0.98092086, 0.05994199, 0.89054594, 0.5769015 ],
[ 0.54468488, 0.76911517, 0.25069523, 0.28589569, 0.85239509]])
You can sum over axis with a.sum
.
Then you can use where on the resulting vector.
results = np.where(a.sum(axis=0) < threshold))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With