I am trying to use k nearest neighbours implementation from scikit learn on a fairly large dataset. The problem is that predictions take a very long time, almost as long as training which doesn't make sense. Is it an issue with the algorithm, or the fact that scikit learn isn't made for large datasets (no GPU support).
For further information, I am trying to predict lidar intensity based on x, y, z and object label. Each lidar scan has ~100,000 points, so I'm trying to predict the intensity for each point.
Things to try to make scikit-learn's KNeighborsClassifier run faster:
algorithm parameter: kd_tree, ball_tree for low dimensional data, brute for high dimensional datan_jobs parameter. Using a larger n_jobs doesn't necessarily make things faster, sometimes the opposite.metric="precomputed"If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With