Chapter 5. Basic feature engineering


This chapter covers

  • Understanding the importance of feature engineering for your machine-learning project
  • Using basic feature-engineering processes, including processing dates and times and simple texts
  • Selecting optimal features and reducing the statistical and computational complexity of the model
  • Using feature engineering at model-building and prediction time

The first four chapters have shown you how to fit, evaluate, and optimize a supervised machine-learning algorithm, given a set of input features and a target of interest. But where do those input features come from? How do you go about defining and calculating features? And how do practitioners know whether they’re using the right set of features for their problem?

5.1. Motivation: why is feature engineering useful?

In this chapter, we explore how to create features from raw input data—a process referred to as feature engineering—and walk through a few examples of simple feature-engineering processes. This will set the groundwork for the more sophisticated feature--engineering algorithms covered in chapter 7.

5.1.1. What is feature engineering?

Feature engineering is the practice of using mathematical transformations of raw input data to create new features to be used in an ML model. The following are examples of such transformations:

5.2. Basic feature-engineering processes

5.3. Feature selection

5.4. Summary

5.5. Terms from this chapter



feature engineering Transforming input data to extract more value and improve the predictive accuracy of ML models
feature selection Process of choosing the most predictive subset of features out of a larger set
forward selection A version of feature selection that iteratively adds the feature that increases the accuracy of model the most, conditional on the current active feature set
backward elimination A version of feature selection that removes the feature that decreases the accuracy of model the most, conditional on the current active feature set
bag of words A method for turning arbitrary text into numerical features for use by the ML algorithm