Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Weighted distance in sklearn KNN

I'm making a genetic algorithm to find weights in order to apply them to the euclidean distance in the sklearn KNN, trying to improve the classification rate and removing some characteristics in the dataset (I made this with changing the weight to 0). I'm using Python and the sklearn's KNN. This is how I'm using it:

def w_dist(x, y, **kwargs):
   return sum(kwargs["weights"]*((x-y)*(x-y)))

KNN = KNeighborsClassifier(n_neighbors=1,metric=w_dist,metric_params={"weights": w})
KNN.fit(X_train,Y_train)
neighbors=KNN.kneighbors(n_neighbors=1,return_distance=False)
Y_n=Y_train[neighbors]
tot=0
for (a,b)in zip(Y_train,Y_vecinos):
    if a==b:
        tot+=1

reduc_rate=X_train.shape[1]-np.count_nonzero(w)/tamaño
class_rate=tot/X_train.shape[0]

It's working really well, but it's very slow. I have been profiling my code and the slowest part is the evaluation of the distance.

I want to ask if there is some different way to tell KNN to use weights in the distance (I must use the euclidean distance, but I remove the square root).

Thanks!

like image 318
Antonio Manuel Avatar asked Oct 20 '25 13:10

Antonio Manuel


1 Answers

There is indeed another way, and it's inbuilt into scikit-learn (so should be quicker). You can use the wminkowski metric with weights. Below is an example with random weights for the features in your training set.

knn = KNeighborsClassifier(metric='wminkowski', p=2, 
                           metric_params={'w': np.random.random(X_train.shape[1])})
like image 118
piman314 Avatar answered Oct 22 '25 04:10

piman314