I want to cluster my data with KL-divergence as my metric.
In K-means:
Choose the number of clusters.
Initialize each cluster's mean at random.
Assign each data point to a cluster c with minimal distance value.
Update each cluster's mean to that of the data points assigned to it.
In the Euclidean case it's easy to update the mean, just by averaging each vector.
However, if I'd like to use KL-divergence as my metric, how do I update my mean?
Clustering with KL-divergence may not be the best idea, because KLD is missing an important property of metrics: symmetry. Obtained clusters could then be quite hard to interpret. If you want to go ahead with KLD, you could use as distance the average of KLD's i.e.
d(x,y) = KLD(x,y)/2 + KLD(y,x)/2
It is not a good idea to use KLD for two reasons:-
Adding a small number may affect the accuracy.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With