You’re now on your way to becoming a supervised machine learning virtuoso! So far, your toolbox of machine learning algorithms gives you the skills to tackle many real-world classification and regression problems. We’re now going to move into the realm of unsupervised learning, where we are no longer relying on labeled data to learn patterns from the data. Because we no longer have a ground truth to compare to, validating the performance of unsupervised learners can be challenging, but I’ll show practical ways to ensure the best performance possible.
Recall from chapter 1 that unsupervised learning can be divided into two goals: dimension reduction and clustering. In chapters 13, 14, and 15, I’ll introduce you to several dimension-reduction algorithms you can use to turn a large number of variables into a smaller, more manageable number. Our motivations for doing this might be to simplify the process of visualizing patterns in data with many dimensions; or as a preprocessing step before passing our data into a supervised algorithm, to mitigate the curse of dimensionality.