# Chapter 4. Model evaluation and optimization

### This chapter covers

• Using cross-validation for properly evaluating the predictive performance of models
• Overfitting and how to avoid it
• Standard evaluation metrics and visualizations for binary and multiclass classification
• Standard evaluation metrics and visualizations for regression models
• Optimizing your model by selecting the optimal parameters

After you fit a machine-learning model, the next step is to assess the accuracy of that model. Before you can put a model to use, you need to know how well it’s expected to predict on new data. If you determine that the predictive performance is quite good, you can be comfortable in deploying that model in production to analyze new data. Likewise, if you assess that the predictive performance isn’t good enough for the task at hand, you can revisit your data and model to try to improve and optimize its accuracy. (The last section of this chapter introduces simple model optimization. Chapters 5, 7, and 9 cover more-sophisticated methods of improving the predictive accuracy of ML models.)

Properly assessing the predictive performance of an ML model is a nontrivial task. We begin this chapter by introducing statistically rigorous techniques to evaluate the predictive performance of ML models, demonstrating both pictorially and with pseudocode how to perform correct validation of a model.

### 4.6. Terms from this chapter

Word

Definition

underfitting/overfitting Using a model that’s too simple or too complex, respectively, for the problem at hand.
evaluation metric A number that characterizes the performance of the model.
mean squared error A specific evaluation metric used in regression models.
cross-validation The method of splitting the training set into two or more training/testing sets in order to better assess the accuracy.
holdout method A form of cross-validation in which a single test set is held out of the model-fitting routine for testing purposes.
k-fold cross-validation A kind of cross-validation in which data is split into k random disjoint sets (folds). The folds are held out one at a time, and cross-validated on models built on the remainder of the data.
confusion matrix A matrix showing for each class the number of predicted values that were correctly classified or not.
receiver operating characteristic (ROC) A number representing true positives, false positives, true negatives, or false negatives.
area under the ROC curve (AUC) An evaluation metric for classification tasks defined from the area under the ROC curve of false positives versus true positives.
tuning parameter An internal parameter to a machine-learning algorithm, such as the bandwidth parameter for kernel-smoothing regression.
grid search A brute-force strategy for selecting the best values for the tuning parameters to optimize an ML model.