This chapter covers
- Creating self-organizing maps to reduce dimensionality
- Creating locally linear embeddings of high-dimensional data
In this chapter, we’re continuing with dimension reduction: the class of machine learning tasks focused on representing the information contained in a large number of variables, in a smaller number of variables. As you learned in chapters 13 and 14, there are multiple possible ways to reduce the dimensions of a dataset. Which dimension-reduction algorithm works best for you depends on the structure of your data and what you’re trying to achieve. Therefore, in this chapter, I’m going to add two more nonlinear dimension-reduction algorithms to your ever-growing machine learning toolbox: self-organizing maps (SOMs) and locally linear embedding (LLE).
Both the SOM and LLE algorithms reduce a large dataset into a smaller, more manageable number of variables, but they work in very different ways. The SOM algorithm creates a two-dimensional grid of nodes, like grid references on a map. Each case in the data is placed into a node and then shuffled around the nodes so that cases that are more similar to each other in the original data are put close together on the map.