6 K-means clustering

 

Humanity has never had more data about more facets of society than it does today. Computers are great for storing data sets, but those data sets have little value to society until they are analyzed by human beings. Computational techniques can guide humans on the road to deriving meaning from a data set.

Clustering is a computational technique that divides the points in a data set into groups. A successful clustering results in groups that contain points that are related to one another. Whether those relationships are meaningful generally requires human verification.

In clustering, the group (a.k.a. cluster) that a data point belongs to is not predetermined, but instead is decided during the run of the clustering algorithm. In fact, the algorithm is not guided to place any particular data point in any particular cluster by presupposed information. For this reason, clustering is considered an unsupervised method within the realm of machine learning. You can think of unsupervised as meaning not guided by foreknowledge.

6.1      Preliminaries

6.2      The k-means clustering algorithm

6.3      Clustering governors by age and longitude

6.4      Clustering Michael Jackson albums by length

6.5      K-means clustering problems and extensions

6.6      Real-world applications

6.7      Exercises