chapter sixteen

16 Cluster analysis

This chapter covers

Identifying cohesive subgroups (clusters) of observations
Determining the number of clusters present
Obtaining a nested hierarchy of clusters
Obtaining discrete clusters

Cluster analysis is a data-reduction technique designed to uncover subgroups of observations within a dataset. It allows you to reduce a large number of observations to a much smaller number of clusters or types. A cluster is defined as a group of observations that are more similar to each other than they are to the observations in other groups. This isn’t a precise definition, and that fact has given rise to an enormous variety of clustering methods.

16.1 Common steps in cluster analysis

16.2 Calculating distances

16 Cluster analysis

This chapter covers

16.1 Common steps in cluster analysis

16.2 Calculating distances

16.3 Hierarchical cluster analysis

16.4 Partitioning-cluster analysis

16.4.1 K-means clustering

16.4.2 Partitioning around medoids

16.5 Avoiding nonexistent clusters

16.6 Going further

Summary