chapter three

3 Classifiers and vector calculus

We took a first look at the core concept of machine learning in section 1.3. Then, in section 2.8.2, we examined classifiers as a special case. But so far, we have skipped the topic of error minimization: given one or more training examples, how do we adjust the weights and biases to make the machine closer to the desired ideal? We will study this topic in this chapter by discussing the concept of gradients.

NOTE

The complete PyTorch code for this chapter is available at http://mng.bz /4Zya in the form of fully functional and executable Jupyter notebooks.

3.1 Geometrical view of image classification

To fix our ideas, consider a machine that classifies whether an image contains a car or a giraffe. Such classifiers, with only two classes, are known as binary classifiers. The first question is how to represent the input.

3.1.1 Input representation

The car-versus-giraffe scenario belongs to a special class of problems where we are analyzing a visual scene. Here, the inputs are the brightness levels of various points in the 3D scene projected onto a 2D image plane. Each element of the image represents a point in the actual scene and is referred to as a pixel. The image is a two-dimensional array representing the collection of pixel values at a given instant in time. It is usually scaled to a fixed size, say 224 × 224. As such, the image can be viewed as a matrix:

Each element of the matrix, X_{i, j}, is a pixel color value in the range [0,255].

3.1.2 Classifiers as decision boundaries

3.1.3 Modeling in a nutshell

3 Classifiers and vector calculus

NOTE

3.1 Geometrical view of image classification

3.1.1 Input representation

3.1.2 Classifiers as decision boundaries

3.1.3 Modeling in a nutshell

3.1.4 Sign of the surface function in binary classification

3.2 Error, aka loss function

3.3 Minimizing loss functions: Gradient vectors

3.3.1 Gradients: A machine learning-centric introduction

3.3.2 Level surface representation and loss minimization

3.4 Local approximation for the loss function

3.4.1 1D Taylor series recap

3.4.2 Multidimensional Taylor series and the Hessian matrix

3.5 PyTorch code for gradient descent, error minimization,and model training

3.5.1 PyTorch code for linear models

3.5.2 Autograd: PyTorch automatic gradient computation

3.5.3 Nonlinear Models in PyTorch

3.7 Convex sets and functions