6 Common design building blocks

 

This chapter covers

  • Adding new activation functions
  • Inserting new layers to improve training
  • Skipping layers as a useful design pattern
  • Combining new activations, layers, and skips into new approaches more powerful than the sum of their parts

At this point, we have learned about the three most common and fundamental types of neural networks: fully connected, convolutional, and recurrent. We have improved all of these architectures by changing the optimizer and learning rate schedule, which alter how we update the parameters (weights) of our models, giving us more accurate models almost for free. All of the things we have learned thus far also have a long shelf life and have taught us about problems that have existed for decades (and continue). They give you a good foundation to speak the language of deep learning and some very fundamental building blocks that larger algorithms are made from.

6.1 Better activation functions

6.1.1  Vanishing gradients

6.1.2  Rectified linear units (ReLUs): Avoiding vanishing gradients

6.1.3  Training with LeakyReLU activations

6.2 Normalization layers: Magically better convergence

6.2.1  Where do normalization layers go?

6.2.2  Batch normalization

6.2.3  Training with batch normalization

6.2.4  Layer normalization

6.2.5  Training with layer normalization

6.2.6  Which normalization layer to use?

6.2.7  A peculiarity of layer normalization

6.3 Skip connections: A network design pattern

6.3.1  Implementing fully connected skips

6.3.2  Implementing convolutional skips

6.4 1 × 1 Convolutions: Sharing and reshaping information in channels

6.4.1  Training with 1 × 1 convolutions

6.5 Residual connections

6.5.1  Residual blocks

6.5.2  Implementing residual blocks

6.5.3  Residual bottlenecks

6.5.4  Implementing residual bottlenecks