Optimal way for finding index of 'greatest value less than' in Numpy array

Question

I have a sorted numpy array X and also two constants k and delta that are not in X. I would like to find the index corresponding to, the largest value in X less than or equal to k and the value must be within delta of k i.e. I want

max {i | k - delta <= X[i] <= k }    (1)

Note this set may be empty in which case I will return None. The way I'm doing it I currently feel is unoptimal as it doesn't take advantage of the fact X is ordered at the first step

# Get the max from the set of indices in X satisfying (1)
idx = np.where((k-delta <= X) * (X <= k))[0].max()

I'm not sure how clever Numpy can be when doing this as it doesn't already know X is sorted hence the (k-delta <= X) * (X <= k)) will I assume take longer than necessary. Note we can use the .max() as we know ourselves the array is sorted.

What would be a more optimal way of doing this?

Divakar · Accepted Answer

One efficient approach to leverage the sorted order would be with np.searchsorted -

def largest_within_delta(X, k, delta):
    right_idx = X.searchsorted(k,'right')-1
    if (k - X[right_idx]) <= delta:
        return right_idx
    else:
        return None

Sample runs to cover various scenarios -

In [216]: X
Out[216]: array([ 8,  9, 33, 35, 36, 37, 44, 45, 71, 81])

In [217]: largest_within_delta(X, 36, 0) # this k is already in array
Out[217]: 4

In [218]: largest_within_delta(X, 36, 1) # shouldn't choose for next one 37
Out[218]: 4    

In [220]: largest_within_delta(X, 40, 3) # Gets 37's index
Out[220]: 5

In [221]: largest_within_delta(X, 40, 2) # Out of 37's reach

Runtime test

In [212]: # Inputs
     ...: X = np.unique(np.random.randint(0,1000000,(10000)))
     ...: k = 50000
     ...: delta = 100
     ...: 

In [213]: %timeit np.where((k-delta <= X) * (X <= k))[0].max()
10000 loops, best of 3: 44.6 µs per loop

In [214]: %timeit largest_within_delta(X, k, delta)
100000 loops, best of 3: 3.22 µs per loop

Optimal way for finding index of 'greatest value less than' in Numpy array

Tags:

performance

python

arrays

sorting

numpy

rwolst

1 Answers

Divakar

Recent Activity

Donate For Us

Optimal way for finding index of 'greatest value less than' in Numpy array

Tags:

performance

python

arrays

sorting

numpy

rwolst

1 Answers

Divakar

Related questions

Recent Activity

Donate For Us