Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find nearest neighbors of a numpy array in list of numpy arrays using euclidian distance

I have a n-dimensional vector and I want to find its k nearest neighbors in a list of n-dimensional vectors using euclidian distance.

I wrote the following code (with k=10) which works but runs too slowly and I was wondering if there was a more optimal solution.

def nearest_neighbors(value, array, nbr_neighbors=1):
    return np.argsort(np.array([np.linalg.norm(value-x) for x in array]))[:nbr_neighbors]
like image 784
Bastien Beurier Avatar asked Sep 06 '25 02:09

Bastien Beurier


2 Answers

Use scipy's kd-tree.

A small example is available here.

Many people seem to complain about the performance and recommend sklearn's implementation though (links sklearn.neighbors, which is using this data-structure internally)!

like image 118
sascha Avatar answered Sep 08 '25 23:09

sascha


As sascha said, I ended up using the scipy library (but the NearestNeighbors method) which brought down the computation time from 50 hours to 36 minutes. It is the kind of computation I should not have tried to reimplement myself as dedicated libraries are much more optimized for this.

The NearestNeighbors method also allows you to pass in a list of values and returns the k nearest neighbors for each value.

Final code was:

def nearest_neighbors(values, all_values, nbr_neighbors=10):
    nn = NearestNeighbors(nbr_neighbors, metric='cosine', algorithm='brute').fit(all_values)
    dists, idxs = nn.kneighbors(values)
like image 38
Bastien Beurier Avatar answered Sep 08 '25 23:09

Bastien Beurier