8 Object detection
This chapter covers
- Making a prediction for every pixel
- Working with image segmentation
- Enlarging images with transposed convolutions
- Using bounding boxes for object detection with Faster R-CNN
- Filtering results to reduce false positives
Imagine this: you want to build a system that counts the different kinds of birds in a park. You point a camera at the sky, and for each bird in this photograph, you want to know its species name. But what if there are no birds in the picture? Or just 1? Or 12? To accommodate these situations, you need to first detect each bird in the image and then classify each detected bird. This two-step process is known as object detection, and it comes in many forms. Broadly, they all involve identifying the subcomponents of an image. So instead of generating one prediction per image, which is what our models have done so far, the system generates many predictions from a single image.