chapter four

4 Using Hugging Face for computer vision tasks

This chapter covers

Different types of Hugging Face computer vision models
Various ways to use models for object detection
Video content and image classification tasks
Image segmentation tasks

Previously, you learned about Hugging Face transformers and pipelines. You also learned how to use some pretrained models for natural language processing (NLP) tasks, such as sentiment analysis and text translation. Hugging Face also provides a vast collection of pretrained models for computer vision tasks. Using all these hosted pretrained models, you can create interesting applications that detect objects in images, the age of a person, and more. In this chapter, you learn how to perform the first four tasks using Hugging Face models.

4.1 Hugging Face computer vision models

The computer vision models (https://huggingface.co/models; see figure 4.1) hosted on Hugging Face are grouped by task type:

Object detection
Image classification
Image segmentation
Video classification
Depth estimation
Image-to-image
Unconditional image generation
Zero-shot image classification

Figure 4.1 Computer vision–related models on the Hugging Face website

A screenshot of a computer

Description automatically generated

4.2 Object detection

Object detection is a computer vision technique that involves identifying and locating objects of interest within an image or video. The primary goals of object detection are to classify the objects in the image or video and determine their precise positions by drawing bounding boxes around them.

4 Using Hugging Face for computer vision tasks

This chapter covers

4.1 Hugging Face computer vision models

Figure 4.1 Computer vision–related models on the Hugging Face website

4.2 Object detection

4.2.1 Using the model directly

4.2.2 Using the transformers pipeline

4.2.3 Binding to a webcam

4.3 Image classification

4.4 Image segmentation