Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sklearn: Get Distance from Point to Nearest Cluster

I'm using clustering algorithms like DBSCAN.

It returns a 'cluster' called -1 which are points that are not part of any cluster. For these points I want to determine the distance from it to the nearest cluster to get something like a metric for how abnormal this point is. Is this possible? Or are there any alternatives for this kind of metric?

like image 691
ScientiaEtVeritas Avatar asked Oct 24 '25 03:10

ScientiaEtVeritas


2 Answers

The answer will depend on the linkage strategy you choose. I'll give the example of single linkage.

First, you can construct the distance matrix of your data.

from sklearn.metrics.pairwise import pairwise_distances
dist_matrix = pairwise_distances(X)

Then, you'll extract the nearest cluster:

for point in unclustered_points:
    distances = []
    for cluster in clusters:
        distance = dist_matrix[point, cluster].min()  # Single linkage
        distances.append(distance)
    print("The cluster for {} is {}".format(point, cluster)

EDIT: This works, but it's O(n^2) as noted by Anony-Mousse. Considering core points is a better idea because it cuts down on your work. In addition, it is somewhat similar to centroid linkage.

like image 76
Arya McCarthy Avatar answered Oct 26 '25 17:10

Arya McCarthy


To be closer to the intuition of DBSCAN you probably should only consider core points.

Put the core points into a nearest neighbor searcher. Then search for all noise points, use the cluster label of the nearest point.

like image 42
Has QUIT--Anony-Mousse Avatar answered Oct 26 '25 16:10

Has QUIT--Anony-Mousse