Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is k nearest neighbours regression inherently slow?

I am trying to use k nearest neighbours implementation from scikit learn on a fairly large dataset. The problem is that predictions take a very long time, almost as long as training which doesn't make sense. Is it an issue with the algorithm, or the fact that scikit learn isn't made for large datasets (no GPU support).

For further information, I am trying to predict lidar intensity based on x, y, z and object label. Each lidar scan has ~100,000 points, so I'm trying to predict the intensity for each point.

like image 850
Ivan Novikov Avatar asked Feb 01 '26 18:02

Ivan Novikov


1 Answers

Things to try to make scikit-learn's KNeighborsClassifier run faster:

  • different algorithm parameter: kd_tree, ball_tree for low dimensional data, brute for high dimensional data
  • n_jobs parameter. Using a larger n_jobs doesn't necessarily make things faster, sometimes the opposite.
  • make sure you are using the latest version: there have been performance improvements in v0.22 and some not yet merged optimizations (scikit-learn#14543)
  • use an external approximate nearest neighbours library (e.g. Annoy) together with pre-computed sparse distances using metric="precomputed"
like image 107
rth Avatar answered Feb 03 '26 08:02

rth