Clustering

K-means

100 pages machine learning

Optimization of K-Means

How to choose k ?

The value of k, the number of clusters, is a hyperparameter that has to be tuned by the data analyst. There are some techniques for selecting k. None of them is proven optimal. Most of those techniques require the analyst to make an “educated guess” by looking at some metrics or by examining cluster assignments visually.

Methods

  • Prediction Strength (See 9.2.3 [1] )

  • Gap statistic method

  • Elbow method

  • Average silihouette method

Referene

[1] The Hundred-Page Machine Learning Book http://themlbook.com/wiki/doku.php

[2] Machine Learning, Huang, VTech

Last updated

Was this helpful?