I'm trying to optimize the performance of a script, which is full of Numpy's where() after which only the first returned element is actually used. Example:
F = np.where(Y>p/100)[0]
For the huge data sets that we are processing, it doesn't look like a good solution (both in terms of speed and memory consumption) to create a large array and then discard all but the first element. Is there any way how to skip the overhead, maybe by tweaking the condition?
You can use argmax in cases where you want the first item. It returns the index of that item.
idx = np.argmax(Y > p/100)
if Y[idx] > p/100:
F = idx
else:
F = None
First thing, np.where(Y>p/100)[0] is not returning the index/coordinates of the first match, but rather the first coordinate of all matches. For the coordinates of the first match, you would need next(zip(*np.where(Y>p/100))).
Assuming you really want only the first match, I don't think there is a way to stop checking the values after the first match, but you could avoid the tuple output with a vectorial operation and argmax (+ any if you're not sure to have a match):
m = Y>p/100
m.argmax() if m.any() else None
If Y is a ND-array, you will then need unravel_index to get the coordinates with the original number of dimensions:
F = np.unravel_index(m.argmax(), Y.shape)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With