Chapter 3. Classifying based on similarities with k-nearest neighbors
This chapter covers
- Understanding the bias-variance trade-off
- Underfitting vs. overfitting
- Using cross-validation to assess model performance
- Building a k-nearest neighbors classifier
- Tuning hyperparameters
This is probably the most important chapter of the entire book. In it, I’m going to show you how the k-nearest neighbors (kNN) algorithm works, and we’re going to use it to classify potential diabetes patients. In addition, I’m going to use the kNN algorithm to teach you some essential concepts in machine learning that we will rely on for the rest of the book.
By the end of this chapter, not only will you understand and be able to use the kNN algorithm to make classification models, but you will be able to validate its performance and tune it to improve its performance as much as possible. Once the model is built, you’ll learn how to pass new, unseen data into it and get the data’s predicted classes (the value of the categorical or grouping variable we are trying to predict). I’ll introduce you to the extremely powerful mlr package in R, which contains a mouth-watering number of machine learning algorithms and greatly simplifies all of our machine learning tasks.