chapter sixteen

16 Training neural networks

This chapter covers

Classifying images of handwritten digits as vector data
Designing a type of neural network called a multilayer perceptron
Evaluating a neural network as a vector transformation
Fitting a neural network to data with a cost function and gradient descent
Calculating partial derivatives for neural networks in backpropagation

In the final chapter of this book, we combine almost everything you’ve learned so far to introduce one of the most famous machine learning tools used today: artificial neural networks. Artificial neural networks, or neural networks for short, are mathematical functions whose structure is loosely based on the structure of the human brain. These are called artificial to distinguish from the “organic” neural networks that exist in the brain. This might sound like a lofty and complex goal, but it’s all based on a simple metaphor for how the brain works.

Before explaining the metaphor, I’ll preface this discussion by reminding you that I’m not a neurologist. The rough idea is that the brain is a big clump of interconnected cells called neurons and, when you think certain thoughts, what’s actually happening is electrical activity at specific neurons. You can see this electrical activity in the right kind of brain scan where various parts of the brain light up (figure 16.1).

Figure 16.1 Different kinds of brain activity cause different neurons to electrically activate, showing bright areas in a brain scan.

16.1 Classifying data with neural networks

16.2 Classifying images of handwritten digits

16 Training neural networks

This chapter covers

Figure 16.1 Different kinds of brain activity cause different neurons to electrically activate, showing bright areas in a brain scan.

16.1 Classifying data with neural networks

16.2 Classifying images of handwritten digits

16.2.1 Building the 64-dimensional image vectors

16.2.2 Building a random digit classifier

16.2.3 Measuring performance of the digit classifier

16.2.4 Exercises

16.3 Designing a neural network

16.3.1 Organizing neurons and connections

16.3.2 Data flow through a neural network

16.3.3 Calculating activations

16.3.4 Calculating activations in matrix notation

16.3.5 Exercises