2 The basics of feature engineering

This chapter covers

Understanding the differences between structured and unstructured data
Discovering the four levels of data and how they describe the data’s properties
Looking at the five types of feature engineering and when we want to apply each one
Differentiating between the ways to evaluate feature engineering pipelines

This chapter will provide an introduction to the basic concepts of feature engineering. We will explore the types of data we will encounter and the types of feature engineering techniques we will see throughout this book. Before jumping right into case studies, this chapter will set up the necessary underpinnings of feature engineering and data understanding. Before we can import a package in Python, we need to know what we are looking for and what the data want to convey to us.

Oftentimes, getting started with data can be difficult. Data can be messy, unorganized, large, or in an odd format. As we see various terms, definitions, and examples in this chapter, we will set ourselves up to hit the ground running with our first case study.

2.1 Types of data

2.1.1 Structured data

2.1.2 Unstructured data

2.2 The four levels of data

2.2.1 Qualitative data vs. quantitative data

2.2.2 The nominal level

2.2.3 The ordinal level

2.2.4 The interval level

2.2.5 The ratio level

2.3 The types of feature engineering

2.3.1 Feature improvement

2.3.2 Feature construction

Summary