Clustering
K-means



Optimization of K-Means


How to choose k ?
The value of k, the number of clusters, is a hyperparameter that has to be tuned by the data analyst. There are some techniques for selecting k. None of them is proven optimal. Most of those techniques require the analyst to make an “educated guess” by looking at some metrics or by examining cluster assignments visually.
Methods
Prediction Strength (See 9.2.3 [1] )
Gap statistic method
Elbow method
Average silihouette method
Referene
[1] The Hundred-Page Machine Learning Book http://themlbook.com/wiki/doku.php
[2] Machine Learning, Huang, VTech
Last updated
Was this helpful?