Chapter 5. Automatically clustering data

- Basic clustering with k-means
- Representing audio
- Audio segmentation
- Clustering with a self-organizing map
Suppose you have a collection of not-pirated, totally legal MP3s on your hard drive. All your songs are crowded in one massive folder. Perhaps automatically grouping similar songs into categories such as Country, Rap, and Rock would help organize them. This act of assigning an item to a group (such as an MP3 to a playlist) in an unsupervised fashion is called clustering.
The previous chapter on classification assumes you’re given a training dataset of correctly labeled data. Unfortunately, you don’t always have that luxury when you collect data in the real world. For example, suppose you want to divide a large amount of music into interesting playlists. How could you possibly group songs if you don’t have direct access to their metadata?
Spotify, SoundCloud, Google Music, Pandora, and many other music-streaming services try to solve this problem in order to recommend similar songs to customers. Their approach includes a mixture of various machine-learning techniques, but clustering is often at the heart of the solution.