13 Putting it all in practice: A real-life example of data engineering and machine learning

 

In this chapter

  • cleaning up and preprocessing data to make it readable by our model
  • using Scikit-Learn to train and evaluate several models
  • using grid search to select good hyperparameters for our model
  • using k-fold cross-validation to be able to use our data for training and validation simultaneously

Throughout this book, we’ve learned some of the most important algorithms in supervised learning, and we’ve had the chance to code them and use them to make predictions on several datasets. However, the process of training a model on real data requires several more steps, and this is what we discuss in this chapter.

The Titanic dataset

Cleaning up our dataset: Missing values and how to deal with them

Feature engineering: Transforming the features in our dataset before training the models

Training our models

Tuning the hyperparameters to find the best model: Grid search

Using K-fold cross-validation to reuse our data as training and validation

Summary

Exercises

sitemap