concept `knn` in category `R`

appears as: kNN, kNN

Machine Learning with R, the tidyverse, and mlr

This is an excerpt from Manning's book Machine Learning with R, the tidyverse, and mlr. Login to get full access to this book.

So how does kNN learn? Well, I’m going to use snakes to help me explain. I’m from the UK, where—some people are surprised to learn—we have a few native species of snake. Two examples are the grass snake and the adder, which is the only venomous snake in the UK. But we also have a cute, limbless reptile called a slow worm, which is commonly mistaken for a snake.

to see more go to 3.1.1. How does the k-nearest neighbors algorithm learn?

Recall from chapter 3 that the kNN algorithm is a lazy learner. In other words, it doesn’t do any work during model training (instead, it just stores the training data); it does all of its work when it makes predictions. When making predictions, the kNN algorithm looks in the training set for the k cases most similar to each of the new, unlabeled data values. Each of those k most similar cases votes on the predicted value of the new data. When using kNN for classification, these votes are for class membership, and the winning vote selects the class the model outputs for the new data. To remind you how this process works, I’ve reproduced a modified version of figure 3.4 from chapter 3, in figure 12.2.

Figure 12.2. The kNN algorithm for classification: identifying the k nearest neighbors and taking the majority vote. Lines connect the unlabeled data with their one, three, and five nearest neighbors. The majority vote in each scenario is indicated by the shape drawn under each cross.

The voting process when using kNN for regression is very similar, except that we take the mean of these k votes as the predicted value for the new data.

to see more go to Chapter 12. Regression with kNN, random forest, and XGBoost

12.6. Benchmarking the kNN, random forest, and XGBoost model-building processes

I love a bit of healthy competition. In this section, we’re going to benchmark the kNN, random forest, and XGBoost model-building processes against each other. We start by creating tuning wrappers that wrap together each learner with its hyperparameter tuning process. Then we create a list of these wrapper learners to pass into benchmark(). As this process will take some time, we’re going to define and use a holdout cross-validation procedure to evaluate the performance of each wrapper (ideally we would use k-fold or repeated k-fold).

to see more go to 12.6. Benchmarking the kNN, random forest, and XGBoost model-building processes

concept knn in category R

Machine Learning with R, the tidyverse, and mlr

Figure 12.2. The kNN algorithm for classification: identifying the k nearest neighbors and taking the majority vote. Lines connect the unlabeled data with their one, three, and five nearest neighbors. The majority vote in each scenario is indicated by the shape drawn under each cross.

12.6. Benchmarking the kNN, random forest, and XGBoost model-building processes

Unable to load book!

concept `knn` in category `R`