6 Fine-Tuning Pre-Trained Models and Working with Multimodal Models

 

This chapter covers

  • Using the yelp_polarity dataset to fine-tune a pre-trained model
  • Using a fine-tuned model to perform classification tasks
  • Fine- tuning a pre-trained model to perform multiclass classification tasks
  • Working with multimodal models

Up until this chapter, you have seen how to work with pre-trained models from Hugging Face to tackle a variety of tasks, leveraging their general capabilities for tasks such as text classification, object detection, and language generation. In this chapter, we will delve into the process of fine-tuning these models to adapt them for more specialized tasks, enhancing their performance by training them on domain-specific data.

We will also explore multimodal models, which combine multiple types of data—such as images and text—to address more complex tasks (like identifying the type of animals in an image based on both visual features and descriptive text) that require the integration of different information sources. By the end of this chapter, you will have a solid understanding of how to fine-tune models for better task-specific accuracy and how to work with models that handle multimodal inputs for richer, more comprehensive solutions.

6.1 Fine-Tuning Pre-Trained Models

6.1.1 Loading the yelp_polarity Dataset

6.1.2 Filtering the yelp_polarity Dataset

6.1.3 Tokenizing the Reduced Dataset

6.1.4 Setting Up a Pre-trained Model for Sequence Classification

6.1.5 Configuring and Initializing a Trainer for Fine-Tuning a Pre-Trained Model

6.1.6 Using the Fine-Tuned Model

6.1.7 Fine-Tuning Models for Multiclass Text Classification

6.2 Working with Multimodal Models

6.2.1 Single-Modal Models

6.2.2 Multimodal Models

6.3 Summary