Chapter 2: Generating features

 

Chapter 4 from Machine Learning Systems by Jeff Smith.

This chapter covers

  • Extracting features from raw data
  • Transforming features to make them more useful
  • Selecting among the features you’ve created
  • How to organize feature-generation code

This chapter is the next step on our journey through the components, or phases, of a machine learning system, shown in chapter 1.

Figure 4.1. Phases of machine learning

In this chapter, I’ll guide you through the three main types of operations in a feature pipeline: extraction, transformation, and selection. Not all systems do all the types of operations shown in this chapter, but all feature engineering techniques can be thought of as falling into one of these three buckets. I’ll use type signatures to assign techniques to groups and give our exploration some structure, as shown in table 4.1.

Table 4.1. Phases of feature generation

Phase

Input

Output

Extract RawData Feature
Transform Feature Feature
Select Set[Feature] Set[Feature]

4.1. Spark ML

4.2. Extracting features

4.3. Transforming features

4.3.1. Common feature transforms

4.3.2. Transforming concepts

4.4. Selecting features

4.5. Structuring feature code

4.5.1. Feature generators

4.5.2. Feature set composition

4.6. Applications

4.7. Reactivities

Summary