There are lots of works for the nearest neighbor search problem, so I wonder if I want to do fixed radius range search, could I utilize those algorithms for nearest neighbor search?
perhaps I could do kth-nearest-neighbor search over and over again until I find a point beyond the radius range, but I suppose it might cause a lot of waste.
The way that the training dataset is used during prediction is different. Instead of locating the k-neighbors, the Radius Neighbors Classifier locates all examples in the training dataset that are within a given radius of the new example. The radius neighbors are then used to make a prediction for the new example.
KNN algorithm at the training phase just stores the dataset and when it gets new data, then it classifies that data into a category that is much similar to the new data. Example: Suppose, we have an image of a creature that looks similar to cat and dog, but we want to know either it is a cat or dog.
Not only "There are lots of works for the nearest neighbor search problem," but there are many questions to be asked for your problem. The most important is the number of dimensions.
Make sure you check my answer if you are not sure why the dimension is important.
High dimensional space
Assuming your points lie in a high dimensional space, you should go for Locality-Sensitive Hashing (LSH) . A nice implementation of this algorithm is E²LSH. They also provide slides, if you want to implement it yourself or get a better understanding of what happens. Note that E²LSH solves a randomized version of the R-near neighbor problem, which they call (R, 1 − δ)-near neighbor problem, where δ has to do with the approximation, as they mention in the manual.
You can also check my answer regarding LSH here. I have used it in C++ and I strongly recommend it for the fixed radius search you want to perform!
Low dimensional space
Use CGAL's spatial searching. I have used it many times for this case in C++. Again, if you want to implement yourselves, you can find plenty information at the nice documentation they have and do relative google queries.
Nice answer by the way, so you got my +1. :)
If you have only one query, then this problem is fixed O(n), where n is number of points no matter what.
If you have multiple queries, than this problem is well explored problem, however it's solution is more complex than just nearest neighbor search. Refer to this article: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.95.3191&rep=rep1&type=pdf
Taking d as the number of dimensions and r as the radius, I know at least two different approaches you can use:
Imagine the problem space divided in hypercubes by a grid of size s such that s = r / sqrt(d).
The interesting thing about s = r / sqrt(d) is that any two points inside a hypercube of that size are guaranteed to be at a distance equal or less than r.
You can numerate the grid divisions so that every hypercube can be identified by the tuple formed by the indexes of one of its corners. These index tuples can be used as the keys into a hash structure (the spatial hash).
Now, and this is the difficult part, you have to notice that for any hypercube of the grid A there is a set of neighbor hypercubes N = (N1, N2, ...) whose minimal distance to the given hypercube is equal or less than the given radius, geometrically they look like an hypersphere. You can represent the elements of the set N as the deltas of the grid indexes relatives to A. Note that these index deltas depend on d only.
For instance, in the bi-dimensional case, you have the following geometrical structure:
   NNN    index deltas:    (-1,  2) ( 0,  2) ( 1,  2)
  NNNNN           (-2,  1) (-1,  1) ( 0,  1) ( 1,  1) ( 2,  1)
  NNANN           (-2,  0) (-1,  0) ( 0,  0) ( 1,  0) ( 2,  0)
  NNNNN           (-2, -1) (-1, -1) ( 0, -1) ( 1, -1) ( 2, -1)
   NNN                     (-1, -2) ( 0, -2) ( 1, -2)
So, you already have all you need to implement an efficient search for the points inside a hypersphere:
Push the input points into the spatial hash using the index tuple of the hypercube containing them as the key.
Given a point p the way to search is to determine the hypercube where it lays A, and then using the set of index-deltas, get the set of neighbor hypercubes N that can potentially contain points nearer than r.
Retrieve from the spatial hash the points belonging to the hypercubes N, and check which ones are near enough to p.
There are some extra optimizations that can be carried out, as not checking the points in A as they are guaranteed to be all near enough. A pre-filtering of N can be performed based on the position of p relative to A.
Note that selecting s = r / sqrt(d) provides a nice compromise between having small hypercubes and a not too big set N.
Every time you go down one level on the tree, you check if the space it represents intersects with the query hyper-sphere. If it does, you keep going down on it, otherwise you discard it completelly from the search.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With