Chapter 6. K-means clustering

 

Humanity has never had more data about more facets of society than it does today. Computers are great for storing data sets, but they need humans to draw meaning through their analysis. Clustering is a computational technique that divides the points in a data set into groups. A successful clustering results in groups that contain points that are related to one another, and whether those relationships are meaningful generally requires human verification.

In clustering, the group (a.k.a. cluster) that a data point belongs to is not predetermined, but instead is decided during the run of the clustering algorithm. In fact, the algorithm is not guided to place any particular data point in any particular cluster by presupposed information. For this reason, clustering is sometimes considered an unsupervised method within the realm of machine learning. You can think of “unsupervised” as meaning “not guided by foreknowledge.”

6.1. Preliminaries

6.2. The k-means clustering algorithm

6.3. Clustering governors by age and longitude

6.4. K-means clustering problems and extensions

6.5. Real-world applications

6.6. Exercises