chapter two

Chapter 2. Extracting structure from data: clustering and transforming your data

This chapter covers

Features and the feature space
Expectation maximization—a way of training algorithms
Transforming your data axes to better represent your data

In the previous chapter, you got your feet wet with the concept of intelligent algorithms. From this chapter onward, we’re going to concentrate on the specifics of machine-learning and predictive-analytics algorithms. If you’ve ever wondered what types of algorithms are out there and how they work, then these chapters are for you!

This chapter is specifically about the structure of data. That is, given a dataset, are there certain patterns and rules that describe it? For example, if we have a data set of the population and their job titles, ages, and salaries, are there any general rules or patterns that could be used to simplify the data? For instance, do higher ages correlate with higher salaries? Is a larger percentage of the wealth present in a smaller percentage of the population? If found, these generalizations can be extracted directly either to provide evidence of a pattern or to represent the dataset in a smaller, more compact data file. These two use cases are the purpose of this chapter, and figure 2.1 provides a visual representation.

Chapter 2. Extracting structure from data: clustering and transforming your data

This chapter covers

2.1. Data, structure, bias, and noise

2.2. The curse of dimensionality

2.3. K-means

2.4. The Gaussian mixture model

2.5. The relationship between k-means and GMM

2.6. Transforming the data axis

2.7. Summary