Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Drawing boundary lines based on kmeans cluster centres

I'm quite new to scikit learn, but wanted to try an interesting project.

I have longitude and latitudes for points in the UK, which I used to create cluster centers using scikit learns KMeans class. To visualise this data, rather than having the points as clusters, I wanted to instead draw boundaries around each cluster. For example, if one cluster was London and the other Oxford, I currently have a point at the center of each city, but I was wondering if there's a way to use this data to create a boundary line based on my clusters instead?

Here is my code so far to create the clusters:

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans

location1="XXX"
df = pd.read_csv(location1, encoding = "ISO-8859-1")

#Run kmeans clustering
X = df[['long','lat']].values #~2k locations in the UK
y=df['label'].values   #Label is a 0 or 1
kmeans = KMeans(n_clusters=30, random_state=0).fit(X, y)
centers=kmeans.cluster_centers_
plt.scatter(centers[:,0],centers[:,1], marker='s', s=100)

So I would like to be able to convert the centers in the above example to lines that demarcate each of the regions -- is this possible?

Thanks,

Anant

like image 752
Anant Avatar asked Sep 05 '25 03:09

Anant


2 Answers

I guess you're talking about spatial boundaries, in this case you should follow Bunyk's recommendation and use a Voronoi Diagram [1]. Here is a practical demonstration of what you could achieve: http://nbviewer.jupyter.org/gist/pv/8037100.

like image 143
gmacro Avatar answered Sep 08 '25 00:09

gmacro


You can use Scipi to generate a Voronoi Diagram. docs

For your code it would be

from scipy.spatial import Voronoi, voronoi_plot_2d
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans

location1="XXX"
df = pd.read_csv(location1, encoding = "ISO-8859-1")

#Run kmeans clustering
X = df[['long','lat']].values #~2k locations in the UK
y=df['label'].values   #Label is a 0 or 1
kmeans = KMeans(n_clusters=30, random_state=0).fit(X, y)
centers=kmeans.cluster_centers_

plt.scatter(centers[:,0],centers[:,1], marker='s', s=100)


vor = Voronoi(centers)
fig = voronoi_plot_2d(vor,plt.gca())

plt.show()
like image 42
Jesse McDonald Avatar answered Sep 07 '25 22:09

Jesse McDonald