Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Distance between clusters kmeans sklearn python

I am using sklearn's k-means clustering to cluster my data. Now I want to have the distance between my clusters, but can't find it. I could calculate the distance between each centroid, but wanted to know if there is a function to get it and if there is a way to get the minimum/maximum/average linkage distance between each cluster. My code is very simple:

km = KMeans(n_clusters = 5, random_state = 1)
km.fit(X_tfidf )

clusterkm = km.cluster_centers_

clusters = km.labels_.tolist()

Thank you!

like image 299
LN_P Avatar asked Oct 16 '25 11:10

LN_P


1 Answers

Unfortunately, you're going to have to compute those distances on the cluster centers yourself. Scikit doesn't provide a method for that right out of the box. Here's a comparable problem setup:

from sklearn.datasets import load_iris
from sklearn.cluster import KMeans
from sklearn.metrics.pairwise import euclidean_distances

X, y = load_iris(return_X_y=True)
km = KMeans(n_clusters = 5, random_state = 1).fit(X)

And how you'd compute the distances:

dists = euclidean_distances(km.cluster_centers_)

And then to get the stats you're interested in, you'll only want to compute on the upper (or lower) triangular corner of the distance matrix:

import numpy as np
tri_dists = dists[np.triu_indices(5, 1)]
max_dist, avg_dist, min_dist = tri_dists.max(), tri_dists.mean(), tri_dists.min()
like image 192
TayTay Avatar answered Oct 18 '25 23:10

TayTay



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!