chapter seven

7 Selected supervised learning algorithms

This chapter covers

Markov models: page rank and HMM
Imbalanced learning, including undersampling and oversampling strategies
Active learning, including uncertainty sampling and query by committee strategies
Model selection, including hyperparameter tuning
Ensemble methods, including bagging, boosting, and stacking
ML research, including supervised learning algorithms

In the previous two chapters, we looked at supervised algorithms for classification and regression. In this chapter, we focus on a selected set of supervised learning algorithms. The algorithms are selected to give exposure to a variety of applications—from time series models used in computational finance to imbalanced learning used in fraud detection to active learning used to reduce the number of training labels to model selection and ensemble methods used in all data science competitions. Finally, we conclude with ML research and exercises. Let’s begin by reviewing the fundamentals of Markov models.

7.1 Markov models

In this section, we discuss probabilistic models for a sequence of observations. Time series models have a wide range of applications, including in computational finance, speech recognition, and computational biology. We’ll start by looking at two popular algorithms built upon the properties of Markov chains: the page rank algorithm and the EM algorithm for hidden Markov models (HMMs).

7.1.1 Page rank algorithm

7.1.2 Hidden Markov models

7.2 Imbalanced learning

7.2.1 Undersampling strategies

7.2.2 Oversampling strategies

7.3 Active learning

7.3.1 Query strategies

7.4 Model selection: Hyperparameter tuning

7.4.1 Bayesian optimization

7.5 Ensemble methods

7.5.1 Bagging

7.5.2 Boosting

7.5.3 Stacking

7.6 ML research: Supervised learning algorithms

7.7 Exercises

Summary