chapter eight

8 Object Detection

This chapter covers

Making a prediction for every pixel.
Working with image segmentation.
Enlarging images with transposed convolutions.
Using bounding boxes for object detection with Faster R-CNN.
Filtering results to reduce false positives.

At this point in the book you now know how to build an effective image classification model for most problems you may run into. A combination of data augmentation, better optimizers, and Residual Networks, is an effective combination and a good starting point. But all of the methods and examples we have looked at assume that the image is of a desired class. For MNIST this means we assume that an image always contains a digit, and the digit is one of 0 through 9. But what do you do if it is possible that an image might be empty? Even worse, what happens if there are multiple digits within an image? What we want is a way to detect where and what is contained within a single image.

8.1 Image Segmentation

8.1.1 Nuclei Detection

8.1.2 Transposed Convolutions

8.2 U-Net

8.3 Object Detection with Bounding Boxes

8.3.1 Faster R-CNN

8.3.2 Faster R-CNN in PyTorch

8.3.3 Suppress overlapping boxes

8.4 Using the Pre-Trained Faster R-CNN

8.5 Exercises

8.6 Summary