chapter seven

7 Automatically clustering data

This chapter covers

Performing basic clustering with k-means
Representing audio
Segmenting audio
Clustering with a self-organizing map

Suppose that you have a collection of not-pirated, totally legal MP3s on your hard drive. All your songs are crowded into one massive folder. Perhaps automatically grouping similar songs into categories such as Country, Rap, and Rock would help organize them. This act of assigning an item to a group (such as an MP3 to a playlist) in an unsupervised fashion is called clustering.

Chapter 6 assumes that you’re given a training dataset of correctly labeled data. Unfortunately, you don’t always have that luxury when you collect data in the real world. Suppose that you want to divide a large amount of music into interesting playlists. How could you possibly group songs if you don’t have direct access to their metadata?

Spotify, SoundCloud, Google Music, Pandora, and many other music-streaming services try to solve this problem to recommend similar songs to customers. Their approach includes a mixture of various machine-learning techniques, but clustering is often at the heart of the solution.

7 Automatically clustering data

This chapter covers

7.1 Traversing files in TensorFlow

7.2 Extracting features from audio

7.3 Using k-means clustering

7.4 Segmenting audio

7.5 Clustering with a self-organizing map

7.6 Applying clustering

Summary