chapter one

1 Introduction to feature engineering

This chapter covers

Understanding the feature engineering and machine learning pipelines
Examining why feature engineering is important to the machine learning process
Taking a look at the types of feature engineering
Understanding how this book is structured and the types of case studies we will focus on

Much of the current discourse around Artificial Intelligence (AI) and Machine learning (ML) is inherently model-centric - focusing on the latest advancements in ML and deep learning. This model-first approach often comes with at best little regard and at worst, total disregard to the data being used to train said models. Fields like MLOps are exploding with ways to systematically train and utilize ML models with as little human interference as possible to “free up” the engineer’s time.

Many prominent AI figures are urging that more focus should be placed on a data-centric view of ML that focuses less on the model selection and hyper-parameter tuning process and more on techniques that enhance the data being ingested and used to train our models. Andrew Ng is on record saying that “machine learning is basically feature engineering” [1] and that we need to be moving more in a data-centric approach [2] Adopting a data-centric approach is especially useful when:

1.1 What is feature engineering and why it matters

1.1.1 Who needs Feature Engineering?

1 Introduction to feature engineering

This chapter covers

1.1 What is feature engineering and why it matters

1.1.1 Who needs Feature Engineering?

1.1.2 What Feature Engineering cannot do

1.1.3 Great data, great models

1.2 The feature engineering pipeline

1.2.1 The Machine Learning Pipeline

1.3 How this book is organized

1.3.1 The five types of feature engineering

1.3.2 A brief overview of this book’s case studies

1.4 Summary