chapter nine

9 Convnet architecture patterns

 

This chapter covers

  • The Modularity-Hierarchy-Reuse formula for model architecture
  • Standard best practices for building convnets: residual connections, batch normalization, and depthwise separable convolutions
  • Ongoing design trends for computer vision models

A model’s “architecture” is the sum of the choices that went into creating it: which layers to use, how to configure them, and in what arrangement to connect them. These choices define the hypothesis space of our model: the space of possible functions that gradient descent can search over, parameterized by the model’s weights. Like feature engineering, a good hypothesis space encodes prior knowledge that we have about the problem at hand and its solution. For instance, using convolution layers means that we know in advance that the relevant patterns present in our input images are translation invariant. To effectively learn from data, we need to make assumptions about what we’re looking for.

9.1 Modularity, hierarchy, and reuse

9.2 Residual connections

9.3 Batch normalization

9.4 Depthwise separable convolutions

9.5 Putting it together: A mini Xception-like model

9.6 Beyond convolution: Vision Transformers

Summary