chapter four

4 Using Hugging Face for computer vision tasks

 

This chapter covers

  • Different types of Hugging Face computer vision models
  • Various ways to use models for object detection
  • Video content and image classification tasks
  • Image segmentation tasks

Previously, you learned about Hugging Face transformers and pipelines. You also learned how to use some pretrained models for natural language processing (NLP) tasks, such as sentiment analysis and text translation. Hugging Face also provides a vast collection of pretrained models for computer vision tasks. Using all these hosted pretrained models, you can create interesting applications that detect objects in images, the age of a person, and more. In this chapter, you learn how to perform the first four tasks using Hugging Face models.

4.1 Hugging Face computer vision models

The computer vision models (https://huggingface.co/models; see figure 4.1) hosted on Hugging Face are grouped by task type:

  • Object detection
  • Image classification
  • Image segmentation
  • Video classification
  • Depth estimation
  • Image-to-image
  • Unconditional image generation
  • Zero-shot image classification
Figure 4.1 Computer vision–related models on the Hugging Face website
A screenshot of a computer

Description automatically generated

4.2 Object detection

Object detection is a computer vision technique that involves identifying and locating objects of interest within an image or video. The primary goals of object detection are to classify the objects in the image or video and determine their precise positions by drawing bounding boxes around them.

4.2.1 Using the model directly

4.2.2 Using the transformers pipeline

4.2.3 Binding to a webcam

4.3 Image classification

4.4 Image segmentation