chapter two

2 Clustering techniques

This chapter covers

Clustering techniques and salient use cases in the industry
Simple k-means, hierarchical, and density-based spatial clustering algorithms
Implementation of algorithms in Python
A case study on cluster analysis

Simplicity is the ultimate sophistication.
—Leonardo da Vinci

Nature loves simplicity and teaches us to follow the same path. Most of the time, our decisions are simple choices. Simple solutions are easier to comprehend, less time-consuming, and painless to maintain and ponder over. The machine learning world is no different. An elegant machine learning solution is not the one that is the most complicated algorithm available but the one that solves the business problem. A robust machine learning solution is easy enough to readily decipher and pragmatic enough to implement. Clustering solutions are generally easier to understand.

In the previous chapter, we defined unsupervised learning and discussed the various unsupervised algorithms available. We will cover each of those algorithms as we work through this book; in this second chapter, we focus on the first of these: clustering algorithms.

2.1 Technical toolkit

2.2 Clustering

2.3 Centroid-based clustering

2.3.1 K-means clustering

2.3.2 Measuring the accuracy of clustering

2.3.3 Finding the optimum value of k

2.3.4 Pros and cons of k-means clustering

2.3.5 K-means clustering implementation using Python

2.4 Connectivity-based clustering

2.4.1 Types of hierarchical clustering

2.4.2 Linkage criterion for distance measurement

2.4.3 Optimal number of clusters

2.4.4 Pros and cons of hierarchical clustering

2.4.5 Hierarchical clustering case study using Python

2.5 Density-based clustering

2.5.1 Neighborhood and density

2.5.2 DBSCAN clustering

2.6 Case study using clustering

2.6.1 Business context

2.6.2 Dataset for the analysis