chapter five

5 Classification algorithms

This chapter covers

Introducing classification
The perceptron algorithm
The SVM algorithm
SGD logistic regression
The Bernoulli naive Bayes algorithm
The decision tree (CART) algorithm

In the previous chapter, we looked at the computer science fundamentals required to implement ML algorithms from scratch. In this chapter, we focus on supervised learning algorithms. Classification is a fundamental class of algorithms and is widely used in machine learning. We will derive from scratch and implement several selected classification algorithms to build our experience with fundamentals and motivate the design of new ML algorithms. The algorithms in this chapter were selected because they illustrate important algorithmic concepts and expose the reader to progressively more complex scenarios that can be implemented from scratch. These concepts have wide application, including email spam detection, document classification, and customer segmentation.

5.1 Introduction to classification

In supervised learning, we are given a dataset D = {(x₁, y₁), …, (x_n, y_n)}, consisting of tuples of data x and labels y. The goal of a classification algorithm is to learn a mapping from inputs x to outputs y, where y is a discrete quantity (i.e., y∈{1, ..., K}). If K = 2, we have a binary classification problem, while for K > 2, we have multiclass classification.

5 Classification algorithms

This chapter covers

5.1 Introduction to classification

5.2 Perceptron

5.3 Support vector machine

5.4 Logistic regression

5.5 Naive Bayes

5.6 Decision tree (CART)

5.7 Exercises

Summary