12 Clustering
This chapter covers
- Classification of different types of clustering
- Partitioning Clustering
- Understanding and implementing k-means
- Density-based clustering
- Understanding and implementing DBSCAN
- OPTICS: refining DBSCAN as a hierarchical clustering
- Evaluating clustering results
In the previous chapters we have described, implemented and applied three data structures designed to efficiently solve nearest neighbor search; when we moved to their applications, we mentioned that clustering was one of the main areas where an efficient nearest search could make a difference. So far we had to delay this discussion, but now it’s finally time to put the icing on the cake, and get the best of our hard work. In this chapter, we will first briefly introduce clustering, explaining what it is and where it stands with respect to machine learning and AI. We’ll see that there are different types of clustering, with radically different approaches, and then we will present and discuss in detail 3 algorithms that uses different approaches; by going through the whole chapter readers will be exposed to the theoretical foundations for this topic, learn about algorithms that can be implemented or just applied to break down datasets into smaller homogeneous groups, and also, in the process, get a deeper understanding of nearest neighbor search and multi-dimensional indexing.