This chapter covers
- Using transformer and estimators to prepare data into ML features
- Assembling features into a vector through a ML pipeline
- Training a simple ML model
- Evaluating a model using relevant performance metrics
- Optimizing a model using cross-validation
- Interpreting a model’s decision-making process through feature weights
In the previous chapter, we set the stage for machine learning: from a raw data set, we tamed the data and crafted features based on our exploration and analysis of the data. Looking back at the data transformation steps from chapter 12, we performed the following work, resulting in a data frame named food_features
.
- Read a CSV file containing dishes name and multiple columns as feature candidates.
- Sanitized the column names (lowered the case, fixed the punctuation, spacing, and non-printable characters)
- Removed illogical and irrelevant records
- Filled the
null
values of binary columns to0.0
- Capped the amounts for
calories
,protein
,fat
, andsodium
to the 99% percentile
- Created ratio features (number of calories from a macro over number of calories for the dish)
- Imputed the
mean
of continuous features.
- Scaled continuous features between
0.0
and1.0
.
Tip
If you want to catch up with the code from chapter 12, I included the code leading to food_features
in the book’s repository under ./code/Ch12/end_of_chapter.py