concept `dimensionality` in category `machine learning`

appears as: dimensionality, dimensionality

Machine Learning with R, the tidyverse, and mlr

This is an excerpt from Manning's book Machine Learning with R, the tidyverse, and mlr. Login to get full access to this book.

Dimension-reduction algorithms take unlabeled (because they are unsupervised learning methods) and high-dimensional data (data with many variables) and learn a way of representing it in a lower number of dimensions. Dimension-reduction algorithms may be used as an exploratory technique (because it’s very difficult for humans to visually interpret data in more than two or three dimensions at once) or as a preprocessing step in the machine learning pipeline (it can help mitigate problems such as collinearity and the curse of dimensionality, terms I’ll define in later chapters). Dimension-reduction algorithms can also be used to help us visually confirm the performance of classification and clustering algorithms (by allowing us to plot the data in two or three dimensions).

to see more go to 1.2.2. Classification, regression, dimension reduction, and clustering

Mitigating the curse of dimensionality

to see more go to Chapter 13. Maximizing variance with principal component analysis

13.1.2. Consequences of the curse of dimensionality

In chapter 5, I discussed the curse of dimensionality. This slightly dramatic-sounding phenomenon describes a set of challenges we encounter when trying to identify patterns in a dataset with many variables. One aspect of the curse of dimensionality is that for a fixed number of cases, as we increase the number of dimensions in the dataset (increase the feature space), the cases get further and further apart. To reiterate this point in figure 13.1, I’ve reproduced figure 5.2 from chapter 5. In this situation, the data is said to become sparse. Many machine learning algorithms struggle to learn patterns from sparse data and may start to learn from the noise in the dataset instead.

Figure 13.1. Data becomes more sparse as the number of dimensions increases. Two classes are shown in one-, two-, and three-dimensional feature spaces. The dotted lines in the three-dimensional representation are to clarify the position of the points along the z-axis. Note the increasing empty space with increased dimensions.

Another aspect of the curse of dimensionality is that as the number of dimensions increases, the distances between the cases begin to converge to a single value. Put another way, for a particular case, the ratio between the distance to its nearest neighbor and its furthest neighbor tends toward 1 in high dimensions. This presents a challenge to algorithms that rely on measuring distances (particularly Euclidean distance), such as k-nearest neighbors, because distance starts to become meaningless.

to see more go to Chapter 13. Maximizing variance with principal component analysis

13.1.4. Mitigating the curse of dimensionality and collinearity by using - dimension reduction

How can you mitigate the impacts of the curse of dimensionality and/or collinearity on the predictive performance of your models? Why, with dimension reduction, of course! If you can compress most of the information from 100 variables into just 2 or 3, then the problems of data sparsity and near-equal distances disappear. If you turn two collinear variables into one new variable that captures all the information of both, then the problem of dependence between the variables disappears.

But we’ve already encountered another set of techniques that can mitigate the curse of dimensionality and collinearity: regularization. As we saw in chapter 11, regularization can be used to shrink the parameter estimates and even completely remove weakly contributing predictors. Regularization can therefore reduce sparsity resulting from the curse of dimensionality, and remove variables that are collinear with others.

to see more go to 13.1.4. Mitigating the curse of dimensionality and collinearity by using - dimension reduction

Algorithms of the Intelligent Web, Second Edition

This is an excerpt from Manning's book Algorithms of the Intelligent Web, Second Edition. Login to get full access to this book.

Second, subject to the curse of dimensionality (covered in chapter 2), getting more data to throw at a simple algorithm often yields results that are far superior to making your classifier more and more complicated. If you look at large corporations such as Google, which use massive amounts of data, an equal measure of achievement should be attributed to how they deal with large volumes of training data as well as the complexity and sophistication of their classification solutions.

to see more go to 1.7.3. Size matters!

We’ll embark on this subject by laying down some fundamental terms and defining the meaning of structure as it relates to data. We’ll also discuss the concepts of bias and noise, which can color your collected data in unexpected ways. You’ll also find a discussion about the curse of dimensionality and the feature space. Simply put, this helps us reason about the relationship between the number of data features, the number of data points, and the phenomenon that we’re trying to capture in an intelligent algorithm.

to see more go to Chapter 2. Extracting structure from data: clustering and transforming your data

The second fundamental problem has a frightening name. It’s called the curse of dimensionality. In simple terms, it means if you have any set of points in high dimensions and you use any metric to measure the distance between these points, they’ll all come out to be roughly the same distance apart! To illustrate this important effect of dimensionality, let’s consider the simple case illustrated in figure 2.2.

Figure 2.2. The curse of dimensionality: every point tends to have the same distance from any other point.

If you look at figure 2.2 from left to right, you’ll see that the dimensionality increases by 1 for each drawing. We start with eight points in one dimension (x-axis) distributed in a uniform fashion: say, between 0 and 1. It follows that the minimum distance we need to traverse from any given point until we meet another point is min(D) = 0.125, whereas the maximum distance is max(D) = 1. Thus, the ratio of min(D) over max(D) is equal to 0.125. In two dimensions, the eight data points are again distributed uniformly, but now we have min(D) = 0.5 and max(D) = 1.414 (along the main diagonal); thus, the ratio of min(D) over max(D) is equal to 0.354. In three dimensions, we have min(D) = 1 and max(D) = 1.732; thus, the ratio of min(D) over max(D) is equal to 0.577. As as the dimensionality continues to increase, the ratio of the minimum distance over the maximum distance approaches the value of 1. This means no matter which direction you look and what distance you measure, it all looks the same!

to see more go to 2.2. The curse of dimensionality

concept dimensionality in category machine learning

Machine Learning with R, the tidyverse, and mlr

13.1.2. Consequences of the curse of dimensionality

13.1.4. Mitigating the curse of dimensionality and collinearity by using - dimension reduction

Algorithms of the Intelligent Web, Second Edition

Figure 2.2. The curse of dimensionality: every point tends to have the same distance from any other point.

Unable to load book!

concept `dimensionality` in category `machine learning`