Chapter 14. Simplifying data with the singular value decomposition
This chapter covers
Restaurants get rolled into a handful of categories: American, Chinese, Japanese, steak house, vegan, and so on. Have you ever thought that these categories weren’t enough? Perhaps you like a hybrid of these categories or a subcategory like Chinese vegetarian. How can we find out how many categories there are? Maybe we could ask some human experts? What if one expert tells us we should divide the restaurants by sauces, and another expert tells us we should divide restaurants by the ingredients? Instead of asking an expert, let’s ask the data. We can take data that records people’s opinions of restaurants and distill it down into underlying factors.
These may line up with our restaurants categories, a specific ingredient used in cooking, or anything. We can then use these factors to estimate what people will think of a restaurants they haven’t yet visited.
The method for distilling this information is known as the singular value decomposition (SVD). It’s a powerful tool used to distill information in a number of applications, from bioinformatics to finance.