chapter nine

9 Transfer learning with pretrained language models

This chapter covers

Using transfer learning to leverage knowledge from unlabeled textual data
Using self-supervised learning to pretrain large language models such as BERT
Building a sentiment analyzer with BERT and the Hugging Face Transformers library
Building a natural language inference model with BERT and AllenNLP

The year 2018 is often called “an inflection point” in the history of NLP. A prominent NLP researcher, Sebastian Ruder (https://ruder.io/nlp-imagenet/), dubbed this change “NLP’s ImageNet moment,” where he used the name of a popular computer vision dataset and powerful models pretrained on it, pointing out that similar changes were underway in the NLP community as well. Powerful pretrained language models such as ELMo, BERT, and GPT-2 achieved state-of-the-art performance in many NLP tasks and completely changed how we build NLP models within months.

9.1 Transfer learning

9.1.1 Traditional machine learning

9.1.2 Word embeddings

9.1.3 What is transfer learning?

9.2 BERT

9.2.1 Limitations of word embeddings

9.2.2 Self-supervised learning

9.2.3 Pretraining BERT

9.2.4 Adapting BERT

9.3 Case study 1: Sentiment analysis with BERT

9.3.1 Tokenizing input

9.3.2 Building the model

9.3.3 Training the model

9.4 Other pretrained language models

9.4.1 ELMo

9.4.2 XLNet

9.4.3 RoBERTa