chapter eight

8 Object detection

This chapter covers

Making a prediction for every pixel
Working with image segmentation
Enlarging images with transposed convolutions
Using bounding boxes for object detection with Faster R-CNN
Filtering results to reduce false positives

Imagine this: you want to build a system that counts the different kinds of birds in a park. You point a camera at the sky, and for each bird in this photograph, you want to know its species name. But what if there are no birds in the picture? Or just 1? Or 12? To accommodate these situations, you need to first detect each bird in the image and then classify each detected bird. This two-step process is known as object detection, and it comes in many forms. Broadly, they all involve identifying the subcomponents of an image. So instead of generating one prediction per image, which is what our models have done so far, the system generates many predictions from a single image.

8.1 Image segmentation

8.1.1 Nuclei detection: Loading the data

8.1.2 Representing the image segmentation problem in PyTorch

8.1.3 Building our first image segmentation network

8.2 Transposed convolutions for expanding image size

8.2.1 Implementing a network with transposed convolutions

8.3 U-Net: Looking at fine and coarse details

8.3.1 Implementing U-Net

8.4 Object detection with bounding boxes

8.4.1 Faster R-CNN

8.4.2 Using Faster R-CNN in PyTorch

8.4.3 Suppressing overlapping boxes

8.5 Using the pretrained Faster R-CNN

Exercises

Summary