16 Cluster analysis

 

This chapter covers

  • Identifying cohesive subgroups (clusters) of observations
  • Determining the number of clusters present
  • Obtaining a nested hierarchy of clusters
  • Obtaining discrete clusters

Cluster analysis is a data-reduction technique designed to uncover subgroups of observations within a dataset. It allows you to reduce a large number of observations to a much smaller number of clusters or types. A cluster is defined as a group of observations that are more similar to each other than they are to the observations in other groups. This isn’t a precise definition, and that fact has given rise to an enormous variety of clustering methods.

16.1 Common steps in cluster analysis

16.2 Calculating distances

16.3 Hierarchical cluster analysis

16.4 Partitioning-cluster analysis

16.4.1 K-means clustering

16.4.2 Partitioning around medoids

16.5 Avoiding nonexistent clusters

16.6 Going further

Summary