6 Fine-Tuning Pre-Trained Models and Working with Multimodal Models
This chapter covers
- Using the yelp_polarity dataset to fine-tune a pre-trained model
- Using a fine-tuned model to perform classification tasks
- Fine- tuning a pre-trained model to perform multiclass classification tasks
- Working with multimodal models
Up until this chapter, you have seen how to work with pre-trained models from Hugging Face to tackle a variety of tasks, leveraging their general capabilities for tasks such as text classification, object detection, and language generation. In this chapter, we will delve into the process of fine-tuning these models to adapt them for more specialized tasks, enhancing their performance by training them on domain-specific data.
We will also explore multimodal models, which combine multiple types of data—such as images and text—to address more complex tasks (like identifying the type of animals in an image based on both visual features and descriptive text) that require the integration of different information sources. By the end of this chapter, you will have a solid understanding of how to fine-tune models for better task-specific accuracy and how to work with models that handle multimodal inputs for richer, more comprehensive solutions.