1 Introduction to feature engineering


This chapter covers

  • Understanding the feature engineering and machine learning pipeline
  • Examining why feature engineering is important to the machine learning process
  • Taking a look at the types of feature engineering
  • Understanding how this book is structured and the types of case studies we will focus on

Much of the current discourse around artificial intelligence (AI) and machine learning (ML) is inherently model-centric, focusing on the latest advancements in ML and deep learning. This model-first approach often comes with, at best, little regard for and, at worst, total disregard of the data being used to train said models. Fields like MLOps are exploding with ways to systematically train and utilize ML models with as little human interference as possible to “free up” the engineer’s time.

Many prominent AI figures are urging data scientists to place more focus on a data-centric view of ML that focuses less on the model selection and hyperparameter-tuning process and more on techniques that enhance the data being ingested and used to train our models. Andrew Ng is on record saying that “machine learning is basically feature engineering” and that we need to be moving more toward a data-centric approach. Adopting a data-centric approach is especially useful when the following are true:

1.1 What is feature engineering, and why does it matter?

1.1.1 Who needs feature engineering?

1.1.2 What feature engineering cannot do

1.1.3 Great data, great models

1.2 The feature engineering pipeline

1.2.1 The machine learning pipeline

1.3 How this book is organized

1.3.1 The five types of feature engineering

1.3.2 A brief overview of this book’s case studies