Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

clustering with limited maximum size

I want to cluster some data points but the maximum number of points per cluster is limited. So there is a maximum size per cluster. Is there any clustering algorithm for that? Also Can I define my own size function. For example, instead of considering the number of points in a cluster as its size, I want to sum a column of all the points in the cluster.

like image 219
Masood_mj Avatar asked Oct 17 '25 10:10

Masood_mj


2 Answers

The problem of k-means clustering with minimum size constraints is addressed in this paper:

Bradley, P. S., K. P. Bennett, and Ayhan Demiriz. "Constrained k-means clustering." Microsoft Research, Redmond (2000): 1-8.

However, the approach proposed in this paper can be easily extended to the maximum size constraints.

Here is an implementation of this algorithm and an extension to it which addresses both minimum size and maximum size constraints.

AS for your question about custom size function, it will be a more difficult problem for which I guess local search approaches are more appropriate.

like image 63
Behrouz Babaki Avatar answered Oct 19 '25 05:10

Behrouz Babaki


A quick and not a optimal solution is spliting data into 2 parts iteratively until the number of data is under the limitation.

like image 45
emeth Avatar answered Oct 19 '25 04:10

emeth



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!