chapter six

6 Fine-tuning pretrained models and working with multimodal models

 

This chapter covers

  • Using the yelp_polarity dataset to fine-tune a pretrained model
  • Using a fine-tuned model to perform classification tasks
  • Fine-tuning a pretrained model to perform multiclass classification tasks
  • Working with multimodal models

Up until this chapter, you’ve seen how to work with pretrained models from Hugging Face to tackle a variety of tasks, using their general capabilities for tasks such as text classification, object detection, and language generation. Now you’ll delve into the process of fine-tuning these models to adapt them for more specialized tasks, enhancing their performance by training them on domain-specific data.

You’ll also explore multimodal models. These models combine multiple types of data, such as images and text, to address more complex tasks (such as identifying the type of animals in an image based on visual features and descriptive text) that require the integration of different information sources. By the end of this chapter, you’ll have a solid understanding of how to fine-tune models for better task-specific accuracy and work with models that handle multimodal inputs for richer, more comprehensive solutions.

6.1 Fine-tuning pretrained models

6.1.1 Loading the yelp_polarity dataset

6.1.2 Filtering the yelp_polarity dataset

6.1.3 Tokenizing the reduced dataset

6.1.4 Setting up a pretrained model for sequence classification

6.1.5 Configuring and initializing a trainer for fine-tuning a pretrained model

6.1.6 Using the fine-tuned model

6.1.7 Fine-tuning models for multiclass text classification

6.2 Working with multimodal models

6.2.1 Single-modal models

6.2.2 Multimodal models

Summary