chapter five

Chapter 5. Automatically clustering data

This chapter covers

Basic clustering with k-means
Representing audio
Audio segmentation
Clustering with a self-organizing map

Suppose you have a collection of not-pirated, totally legal MP3s on your hard drive. All your songs are crowded in one massive folder. Perhaps automatically grouping similar songs into categories such as Country, Rap, and Rock would help organize them. This act of assigning an item to a group (such as an MP3 to a playlist) in an unsupervised fashion is called clustering.

The previous chapter on classification assumes you’re given a training dataset of correctly labeled data. Unfortunately, you don’t always have that luxury when you collect data in the real world. For example, suppose you want to divide a large amount of music into interesting playlists. How could you possibly group songs if you don’t have direct access to their metadata?

Spotify, SoundCloud, Google Music, Pandora, and many other music-streaming services try to solve this problem in order to recommend similar songs to customers. Their approach includes a mixture of various machine-learning techniques, but clustering is often at the heart of the solution.

Chapter 5. Automatically clustering data

5.1. Traversing files in TensorFlow

5.2. Extracting features from audio

5.3. K-means clustering

5.4. Audio segmentation

5.5. Clustering using a self-organizing map

5.6. Application of clustering

5.7. Summary