6 Regularization via Model

This chapter covers

Inductive bias and its use in convolutional neural networks
Applying dropout as a regularization technique to improve model performance
Implicit regularization using multi-task learning

Suitable regularization, whether explicit or implicit, is essential to achieving good generalization performance when using over-parameterized models such as deep neural networks. As introduced in the previous chapter, data augmentation promises a performance boost in visual recognition tasks such as image classification. However, generating valuable artificial datasets using the available training dataset still depends on a good coverage of the population data in terms of data variety. There is not much we can augment when the training set is extremely limited.

Consider, for example, the case of a minimal training set in figure 6.1, which is also the exact figure we encountered in chapter 1. The true underlying function is a curve and is unavailable to us. In this scenario, we would avoid unnecessary complexity in the model as it will easily overfit the training dataset consisting of six data points. The same reasoning applies when the training data is even more reduced. In extreme uncertainties, it is better to remain conservative and avoid undue complexity.

Figure 6.1 Presenting the same scenario with only six data points available. A simple linear regression model is preferred over an overly complex model such as deep neural networks.

6.1 Inductive bias in convolutional neural networks

6.1.1 Revisiting the fully-connected network

6.1.2 Translational invariance in convolutional neural networks

6.1.3 Understanding the convolution operator

6.1.4 Weight sharing in the convolution operation

6.2 Regularizing deep neural networks via dropout

6.2.1 Introducing dropout

6.2.2 Inducing a sparse representation

6.2.3 Dropout in action

6.2.4 Applying dropout in CNN

6.3 Implicit regularization in multi-task learning

6.3.1 Two MTL approaches in deep neural networks

6.3.2 Modifying the loss function to achieve soft parameter sharing

6.3.3 MTL in action

6.4 Summary