chapter eight

8 Transfer Learning

 

This chapter covers

  • Using pre-built and pre-trained models from TF.Keras and TF.Hub for transfer learning.
  • Replacing the task group of layers with a new task group, such as a new classifier.
  • Different methods for training the new task group.

Tensorflow and TF.Keras support a wide availability of pre-built and pre-trained models. Pre-trained models can be used as-is, while pre-built models can be trained from scratch. In addition, pre-trained models with the task group replaced can be used for transfer learning, as well as reconfiguring to perform various tasks.

Let’s discuss for a moment what are the benefits of transfer learning. In essence, it means transferring the knowledge for solving one task to solving another task, such that the new task can be trained faster and with less data. It’s a form of reuse. That is, we are reusing the model with it’s “learned” weights. You might ask, can I reuse the weights learned for more model architecture to another? No, they have to be the same model architecture, such as ResNet50 to a ResNet50. Can I reuse the “learned” weights on any different task? You could, but the results will vary depending on how similar the domain of the pre-trained model is to the new dataset. What we really mean by the “learned” weights, are the “learned” essential features, the corresponding feature extraction, and latent space representation -- i.e., the representational learning.

8.1      TF.Keras Pre-built Models

8.1.1                   Base Model

8.1.2                   Pre-Trained ImageNet Models for Prediction

8.1.3                   New Classifier

8.2      TF Hub Pre-built Models

8.2.1                   Using TF.Hub Pre-Trained Models

8.2.2                   New Classifier

8.3      Transfer Learning

8.3.1                   Similar Tasks

8.3.2                   Distinct Tasks

8.3.3                   Domain Specific Weights

8.3.4                   Domain Transfer Weight Initialization

8.3.5                   Negative Transfer

8.4      Summary