Clustering

K-means

Optimization of K-Means

How to choose k ?

The value of k, the number of clusters, is a hyperparameter that has to be tuned by the data analyst. There are some techniques for selecting k. None of them is proven optimal. Most of those techniques require the analyst to make an “educated guess” by looking at some metrics or by examining cluster assignments visually.

Methods

Prediction Strength (See 9.2.3 [1] )
Gap statistic method
Elbow method
Average silihouette method

Referene

[1] The Hundred-Page Machine Learning Book http://themlbook.com/wiki/doku.php

[2] Machine Learning, Huang, VTech

PreviousUnsupervised Learning NextkNN

Last updated 3 years ago

Was this helpful?