5 Regularization via Data

 

This chapter covers

  • Common challenges in the data and the need for data augmentation
  • Different data augmentation techniques that contribute to regularized model training
  • Applying data augmentation in image classification to boost training performance
  • The deep bootstrap framework that connects offline generalization to online optimization

The data, apart from the training procedure, directly determine the generalization performance of the trained model. Providing a sufficient and representative dataset plays an essential role in training a good and generalizable model. Unfortunately, the limited training data is often all we have to work with, and acquiring new training data comes with an additional cost or is impossible in some cases.

One immediate challenge that arises due to limited data is that the available training data may come from an underlying data generating distribution different from that of the test data. Such distributional difference constitutes nonstationarity in the data. When comparing the training and test sets, a nonstationary dataset has different statistical characteristics such as the mean and variance of the design matrix or the target across different parts of the data. The mapping relationship in between may also shift. Training a classifier on data from one distribution and testing it on another thus does not guarantee a good generalization performance in the test set.

5.1 Data-based methods

5.1.1 Data augmentation

5.1.2 Label smoothing

5.2 Training deep neural networks using data augmentation

5.2.1 Training without data augmentation

5.2.2 Training with data augmentation

5.3 The deep bootstrap framework

5.3.1 Insufficiency of classical generalization framework

5.3.2 Online optimization

5.3.3 Connecting online optimization with offline generalization

5.3.4 Constructing the ideal world with CIFAR-5m

5.3.5 Model training in the ideal world

5.3.6 Model testing

5.3.7 Bootstrap error between real world and ideal world

5.3.8 Implicit bias in convolutional neural networks

5.4 Summary