Chapter 16. Cluster analysis

 

This chapter covers

  • Identifying cohesive subgroups (clusters) of observations
  • Determining the number of clusters present
  • Obtaining a nested hierarchy of clusters
  • Obtaining discrete clusters

Cluster analysis is a data-reduction technique designed to uncover subgroups of observations within a dataset. It allows you to reduce a large number of observations to a much smaller number of clusters or types. A cluster is defined as a group of observations that are more similar to each other than they are to the observations in other groups. This isn’t a precise definition, and that fact has led to an enormous variety of clustering methods.

16.1. Common steps in cluster analysis

16.2. Calculating distances

16.3. Hierarchical cluster analysis

16.4. Partitioning cluster analysis

16.5. Avoiding nonexistent clusters

16.6. Summary