9 Feature selection

 

This chapter covers

  • Understanding principles for feature selection and feature engineering
  • Applying feature selection principles to case studies
  • Sharpening feature selection skills based on case study analysis

Thus far, you have been using the original (raw) data values from the DC taxi data set as the features for your machine learning models. A feature is a value or a collection of values used as an input to a machine learning model during both the training and inference phases of machine learning (see appendix A). Feature engineering, the process of selecting, designing, and implementing synthetic (made-up) features using raw data values, can significantly improve the machine learning performance of your models. Some examples of feature engineering are simple, formulaic transformations of the original data values, for instance rescaling arbitrary numeric values to a range from —1 to 1. Feature selection (also known as feature design), the initial phase of feature engineering, is the more creative part of the effort and involves specification of features that capture human knowledge or intuition about the data set, such as choosing a feature that measures the distance between pickup and drop-off locations for each ride in the taxi trips data set.

9.1 Guiding principles for feature selection

9.1.1 Related to the label

9.1.2 Recorded before inference time

9.1.3 Supported by abundant examples

9.1.4 Expressed as a number with a meaningful scale

9.1.5 Based on expert insights about the project

9.2 Feature selection case studies

9.3 Feature selection using guiding principles

9.3.1 Related to the label

9.3.2 Recorded before inference time

9.3.3 Supported by abundant examples

9.3.4 Numeric with meaningful magnitude

9.3.5 Bring expert insight to the problem

9.4 Selecting features for the DC taxi data set

Summary