chapter fifteen

15 Dimension reduction with networks and local structure: self-organizing maps and locally-linear embedding

This chapter covers:

How to create self-organizing maps to reduce dimensionality
Creating locally-linear embeddings of high-dimensional data

In this chapter, we’re continuing with dimension reduction: the class of machine learning task focused on representing the information contained in a large number of variables, in a smaller number of variables. As we learned in the last two chapters, there are multiple possible ways for us to reduce the dimensieqons of a dataset. Which dimension reduction algorithm works best for you depends on the structure of your data, and what you’re trying to achieve. Therefore, in this chapter I’m going to add two more non-linear dimension reduction algorithms to your ever-growing machine learning toolbox:

self-organizing maps (SOMs)
locally-linear embedding (LLE)

Both the SOM and LLE algorithms reduce a large dataset into a smaller, more manageable number of variables, but work in very different ways. The SOM algorithm creates a two-dimensional grid of nodes, like grid-references on a map. Each case in the data is placed into a node, and then shuffled around the nodes to put cases that are more similar to each other in the original data, close together on the map.

15.1 What are self-organizing maps?

15.1.1 Creating the grid of nodes

15.1.2 Randomly assigning weights, and placing cases in nodes

15.1.3 Updating the weights of the nodes to better match the cases inside them

15.2 Building our first SOM

15.2.1 Loading and exploring the flea dataset

15.2.2 Training the SOM

15.2.3 Plotting the SOM result

15.2.4 Mapping new data onto the SOM

15.3 What is locally-linear embedding?

15.4 Building our first LLE

15.4.1 Loading and exploring the s curve dataset

15.4.2 Training the LLE

15.4.3 Plotting the LLE result

15.5 Building an LLE of our flea data

15.6 Strengths and weaknesses of SOMs and LLE

15.7 Summary

15.8 Solutions to exercises