chapter six

6 Common design building blocks

This chapter covers

Adding new activation functions
Inserting new layers to improve training
Skipping layers as a useful design pattern
Combining new activations, layers, and skips into new approaches more powerful than the sum of their parts

At this point, we have learned about the three most common and fundamental types of neural networks: fully connected, convolutional, and recurrent. We have improved all of these architectures by changing the optimizer and learning rate schedule, which alter how we update the parameters (weights) of our models, giving us more accurate models almost for free. All of the things we have learned thus far also have a long shelf life and have taught us about problems that have existed for decades (and continue). They give you a good foundation to speak the language of deep learning and some very fundamental building blocks that larger algorithms are made from.

6.1 Better activation functions

6.1.1 Vanishing gradients

6.1.2 Rectified linear units (ReLUs): Avoiding vanishing gradients

6.1.3 Training with LeakyReLU activations

6.2 Normalization layers: Magically better convergence

6.2.1 Where do normalization layers go?

6.2.2 Batch normalization

6.2.3 Training with batch normalization

6.2.4 Layer normalization

6.2.5 Training with layer normalization

6.2.6 Which normalization layer to use?

6.2.7 A peculiarity of layer normalization

6.3 Skip connections: A network design pattern

6.3.1 Implementing fully connected skips

6.3.2 Implementing convolutional skips

6.4.1 Training with 1 × 1 convolutions

6.5 Residual connections

6.5.1 Residual blocks

6.5.2 Implementing residual blocks

6.5.3 Residual bottlenecks

6.5.4 Implementing residual bottlenecks