Chapter 2. Extracting structure from data: clustering and transforming your data
This chapter covers
- Features and the feature space
- Expectation maximization—a way of training algorithms
- Transforming your data axes to better represent your data
In the previous chapter, you got your feet wet with the concept of intelligent algorithms. From this chapter onward, we’re going to concentrate on the specifics of machine-learning and predictive-analytics algorithms. If you’ve ever wondered what types of algorithms are out there and how they work, then these chapters are for you!
This chapter is specifically about the structure of data. That is, given a dataset, are there certain patterns and rules that describe it? For example, if we have a data set of the population and their job titles, ages, and salaries, are there any general rules or patterns that could be used to simplify the data? For instance, do higher ages correlate with higher salaries? Is a larger percentage of the wealth present in a smaller percentage of the population? If found, these generalizations can be extracted directly either to provide evidence of a pattern or to represent the dataset in a smaller, more compact data file. These two use cases are the purpose of this chapter, and figure 2.1 provides a visual representation.