I'm making a genetic algorithm to find weights in order to apply them to the euclidean distance in the sklearn KNN, trying to improve the classification rate and removing some characteristics in the dataset (I made this with changing the weight to 0). I'm using Python and the sklearn's KNN. This is how I'm using it:
def w_dist(x, y, **kwargs):
return sum(kwargs["weights"]*((x-y)*(x-y)))
KNN = KNeighborsClassifier(n_neighbors=1,metric=w_dist,metric_params={"weights": w})
KNN.fit(X_train,Y_train)
neighbors=KNN.kneighbors(n_neighbors=1,return_distance=False)
Y_n=Y_train[neighbors]
tot=0
for (a,b)in zip(Y_train,Y_vecinos):
if a==b:
tot+=1
reduc_rate=X_train.shape[1]-np.count_nonzero(w)/tamaño
class_rate=tot/X_train.shape[0]
It's working really well, but it's very slow. I have been profiling my code and the slowest part is the evaluation of the distance.
I want to ask if there is some different way to tell KNN to use weights in the distance (I must use the euclidean distance, but I remove the square root).
Thanks!
There is indeed another way, and it's inbuilt into scikit-learn (so should be quicker). You can use the wminkowski metric with weights. Below is an example with random weights for the features in your training set.
knn = KNeighborsClassifier(metric='wminkowski', p=2,
metric_params={'w': np.random.random(X_train.shape[1])})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With