9 Transfer learning with pretrained language models

 

This chapter covers

  • Using transfer learning to leverage knowledge from unlabeled textual data
  • Using self-supervised learning to pretrain large language models such as BERT
  • Building a sentiment analyzer with BERT and the Hugging Face Transformers library
  • Building a natural language inference model with BERT and AllenNLP

The year 2018 is often called “an inflection point” in the history of NLP. A prominent NLP researcher, Sebastian Ruder (https://ruder.io/nlp-imagenet/), dubbed this change “NLP’s ImageNet moment,” where he used the name of a popular computer vision dataset and powerful models pretrained on it, pointing out that similar changes were underway in the NLP community as well. Powerful pretrained language models such as ELMo, BERT, and GPT-2 achieved state-of-the-art performance in many NLP tasks and completely changed how we build NLP models within months.

9.1 Transfer learning

9.1.1 Traditional machine learning

9.1.2 Word embeddings

9.1.3 What is transfer learning?

9.2 BERT

9.2.1 Limitations of word embeddings

9.2.2 Self-supervised learning

9.2.3 Pretraining BERT

9.2.4 Adapting BERT

9.3 Case study 1: Sentiment analysis with BERT

9.3.1 Tokenizing input

9.3.2 Building the model

9.3.3 Training the model

9.4 Other pretrained language models

9.4.1 ELMo

9.4.2 XLNet

9.4.3 RoBERTa