Chapter 7. Introduction to clustering

 

This chapter covers

  • A hands-on look at clustering
  • Understanding the notion of similarity
  • Running a simple clustering example in Mahout
  • The various distance measures used for clustering

As human beings, we tend to associate with like-minded people—“birds of a feather flock together.” We have a great mental ability for finding repeating patterns, and we continually associate what we see, hear, smell, and taste with things that are already in our memory. For example, the taste of honey reminds us more of the taste of sugar than salt. So we group together the things that taste like sugar and honey and call them sweet. Without even knowing what sweet tastes like, we know that all the sugary things in the world are similar and of the same group. We also know how different they are from all the things belonging to the salty group. Unconsciously, we group together tastes into such clusters. We have clusters of sugary things and salty things, with each group having hundreds of items in it.

7.1. Clustering basics

7.2. Measuring the similarity of items

7.3. Hello World: running a simple clustering example

7.4. Exploring distance measures

7.5. Hello World again! Trying out various distance measures

7.6. Summary