Chapter 5. Choosing and evaluating models
This chapter covers
- Mapping business problems to machine learning tasks
- Evaluating model quality
- Validating model soundness
As a data scientist, your ultimate goal is to solve a concrete business problem: increase look-to-buy ratio, identify fraudulent transactions, predict and manage the losses of a loan portfolio, and so on. Many different statistical modeling methods can be used to solve any given problem. Each statistical method will have its advantages and disadvantages for a given business goal and business constraints. This chapter presents an outline of the most common machine learning and statistical methods used in data science.
To make progress, you must be able to measure model quality during training and also ensure that your model will work as well in the production environment as it did on your training data. In general, we’ll call these two tasks model evaluation and model validation. To prepare for these statistical tests, we always split our data into training data and test data, as illustrated in figure 5.1.