I Have a data frame with two columns and 450 rows. First I have to run a K-means algorithm with different k values(meaning k clusters). And for each time I run a different k value I have to calculate the SSE. I have just the mathematical equation given. SSE is calculated by squaring each points distance to its respective clusters centroid and then summing everything up. So at the end I should have SSE for each k value.
I have gotten to the place where you run the k means algorithm:
Data.kemans <- kmeans(data, centers = 3)
How could I get the SSE (sum of squared errors) from this data.kmeans ?
If you are using scikit-learn to calculate the SSE value, then there is a built-in attribute .inertia_ for that.
from sklearn. cluster import KMeans
kmeans = KMeans()
kmeans.fit(your_data)
kmeans.inertia_ #returns the SSE value
I think this is returned by kmeans. The documentation says:
Value
kmeans returns an object of class "kmeans" which has a print and a fitted method. It is a list with at least the following components:
(...)
totss
The total sum of squares.
withinss
Vector of within-cluster sum of squares, one component per cluster.
tot.withinss
Total within-cluster sum of squares, i.e. sum(withinss).
betweenss
The between-cluster sum of squares, i.e. totss-tot.withinss.
Hence, Data.kmeans$withinss should give you the answer you are looking for.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With