I'm using sklearn.cluster.AgglomerativeClustering. It begins with one cluster per data point and iteratively merges together the two "closest" clusters, thus forming a binary tree. What constitutes distance between clusters depends on a linkage parameter.
It would be useful to know the distance between the merged clusters at each step. We could then stop when the next to be merged clusters get too far apart. Alas, that does not seem to be available in AgglomerativeClustering.
Am I missing something? Is there a way to recover the distances?
You might want to take a look at scipy.cluster.hierarchy which offers somewhat more options than sklearn.cluster.AgglomerativeClustering.
The clustering is done with the linkage function which returns a matrix containing the distances between the merged clusters. These can be visualised with a dendrogram: 
from scipy.cluster.hierarchy import linkage, fcluster, dendrogram
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt
X, cl = make_blobs(n_samples=20, n_features=2, centers=3, cluster_std=0.5, random_state=0)
Z = linkage(X, method='ward')
plt.figure()
dendrogram(Z)
plt.show()

One can form flat clusters from the linkage matrix based on various criteria, e.g. the distance of observations:
clusters = fcluster(Z, 5, criterion='distance')
Scipy's hierarchical clustering is discussed in much more detail here.
When this question was originally asked, and when the other answer was posted, sklearn did not expose the distances. It now does, however, as demonstrated in this example and this answer to a similar question.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With