14 Maximizing similarity: t-SNE and UMAP

This chapter covers:

What is non-linear dimension reduction and why is it important?
What is t-SNE?
What is UMAP?

In the last chapter, I introduced you to PCA as our first dimension reduction technique. While PCA is a linear dimension reduction algorithm (it finds linear combinations of the original variables), sometimes the information in a set of variables can’t be extracted as a linear combination of these variables. In such situations, there are a number of non-linear dimension reduction algorithms we can turn to, such as t-distributed stochastic neighbor embedding (t-SNE), and uniform manifold approximation and projection (UMAP).

The t-SNE is one of the most popular non-linear dimension reduction algorithms. t-SNE measures the distance between each observation in the dataset, to every other observation, then randomizes the observations across (usually) two new axes. The observations are then iteratively shuffled around these new axes until their distances to each other in this two-dimensional space are as similar to the distances in the original high dimensional space as possible.

UMAP is another non-linear dimension reduction algorithm that overcomes some of the limitations of t-SNE. It works in a similar way to t-SNE (finds distances in high-dimensional space, then tries to reproduce these distances in low-dimensional space), but differs in the way it measures distances.

14.1 What is t-SNE?

14.2 Building our first t-SNE embedding

14.2.1 Performing t-SNE

14 Maximizing similarity: t-SNE and UMAP

This chapter covers:

14.1 What is t-SNE?

14.2 Building our first t-SNE embedding

14.2.1 Performing t-SNE

14.2.2 Plotting the result of t-SNE

14.3 What is UMAP?

14.4 Building our first UMAP model

14.4.1 Performing UMAP

14.4.2 Plotting the result of UMAP

14.4.3 Computing the UMAP embeddings of new data

14.5 Strengths and weaknesses of t-SNE and UMAP

14.6 Summary

14.7 Solutions to exercises

14 Maximizing similarity: t-SNE and UMAP

This chapter covers:

14.1 What is t-SNE?

14.2 Building our first t-SNE embedding

14.2.1 Performing t-SNE

14.2.2 Plotting the result of t-SNE

14.3 What is UMAP?

14.4 Building our first UMAP model

14.4.1 Performing UMAP

14.4.2 Plotting the result of UMAP

14.4.3 Computing the UMAP embeddings of new data

14.5 Strengths and weaknesses of t-SNE and UMAP

14.6 Summary

14.7 Solutions to exercises

Unable to load book!