Suppose that you have a collection of not-pirated, totally legal MP3s on your hard drive. All your songs are crowded into one massive folder. Perhaps automatically grouping similar songs into categories such as Country, Rap, and Rock would help organize them. This act of assigning an item to a group (such as an MP3 to a playlist) in an unsupervised fashion is called clustering.
Chapter 6 assumes that you’re given a training dataset of correctly labeled data. Unfortunately, you don’t always have that luxury when you collect data in the real world. Suppose that you want to divide a large amount of music into interesting playlists. How could you possibly group songs if you don’t have direct access to their metadata?