1 Introduction to feature engineering
This chapter covers
- Understanding the feature engineering and machine learning pipelines
- Examining why feature engineering is important to the machine learning process
- Taking a look at the types of feature engineering
- Understanding how this book is structured and the types of case studies we will focus on
Much of the current discourse around Artificial Intelligence (AI) and Machine learning (ML) is inherently model-centric - focusing on the latest advancements in ML and deep learning. This model-first approach often comes with at best little regard and at worst, total disregard to the data being used to train said models. Fields like MLOps are exploding with ways to systematically train and utilize ML models with as little human interference as possible to “free up” the engineer’s time.
Many prominent AI figures are urging that more focus should be placed on a data-centric view of ML that focuses less on the model selection and hyper-parameter tuning process and more on techniques that enhance the data being ingested and used to train our models. Andrew Ng is on record saying that “machine learning is basically feature engineering” [1] and that we need to be moving more in a data-centric approach [2] Adopting a data-centric approach is especially useful when: