chapter two

2 Clustering techniques

In this second chapter, we are going to cover the following topics:

Clustering techniques and salient use cases in the industry
Various clustering algorithms available
K-means, hierarchical clustering, and DBSCAN clustering
Implementation of algorithms in Python
Case study on cluster analysis

“Simplicity is the ultimate sophistication” – Leonardo da Vinci

Nature loves simplicity, and teaches us to follow the same path. Most of the time, our decisions are simple choices. Simple solutions are easier to comprehend, less time consuming, and painless to maintain and ponder over. The machine learning world is no different. An elegant machine learning solution is not one which is the most complicated algorithm available, but one which solves the business problem. A robust machine learning solution is easy enough to readily decipher and pragmatic enough to implement. Clustering solutions are generally easier to be understood.

In the previous chapter, we defined unsupervised learning and discussed the various unsupervised algorithms available. We will cover each of those algorithms as we work through this book; in this second chapter we are going to focus in on the first of these: Clustering algorithms.

2.1 Technical toolkit

2.2 Clustering

2.2.1 Clustering techniques

2.3 Centroid based clustering

2.3.1 K-means clustering

2.3.2 Measure the accuracy of clustering

2.3.3 Finding the optimum value of “k”

2.3.4 Pros and cons of k-means clustering

2.3.5 k-means clustering implementation using Python

2.4 Connectivity based clustering

2.4.1 Types of hierarchical clustering

2.4.2 Linkage criterion for distance measurement

2.4.3 Optimal number of clusters

2.4.4 Pros and cons of hierarchical clustering

2.4.5 Hierarchical clustering case study using Python

2.5 Density based clustering

2.5.1 Neighborhood and density

2.5.2 DBSCAN Clustering

2.6 Case study using clustering

2.7 Common challenges faced in clustering

2.8 Concluding Thoughts

2.9 Summary