14 Maximizing similarity: t-SNE and UMAP

 

This chapter covers:

  • What is non-linear dimension reduction and why is it important?
  • What is t-SNE?
  • What is UMAP?

In the last chapter, I introduced you to PCA as our first dimension reduction technique. While PCA is a linear dimension reduction algorithm (it finds linear combinations of the original variables), sometimes the information in a set of variables can’t be extracted as a linear combination of these variables. In such situations, there are a number of non-linear dimension reduction algorithms we can turn to, such as t-distributed stochastic neighbor embedding (t-SNE), and uniform manifold approximation and projection (UMAP).

The t-SNE is one of the most popular non-linear dimension reduction algorithms. t-SNE measures the distance between each observation in the dataset, to every other observation, then randomizes the observations across (usually) two new axes. The observations are then iteratively shuffled around these new axes until their distances to each other in this two-dimensional space are as similar to the distances in the original high dimensional space as possible.

UMAP is another non-linear dimension reduction algorithm that overcomes some of the limitations of t-SNE. It works in a similar way to t-SNE (finds distances in high-dimensional space, then tries to reproduce these distances in low-dimensional space), but differs in the way it measures distances.

14.1  What is t-SNE?

 
 

14.2  Building our first t-SNE embedding

 
 

14.2.1  Performing t-SNE

 
 
 
 

14.2.2  Plotting the result of t-SNE

 

14.3  What is UMAP?

 

14.4  Building our first UMAP model

 
 
 

14.4.1  Performing UMAP

 
 
 

14.4.2  Plotting the result of UMAP

 
 
 

14.4.3  Computing the UMAP embeddings of new data

 
 

14.5  Strengths and weaknesses of t-SNE and UMAP

 

14.6  Summary

 
 

14.7  Solutions to exercises

 
 
 
 
sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest