chapter eleven

11 Neural networks for image classification and object detection

This chapter covers

Using deeper neural networks for image classification and object detection
Understanding convolutional neural networks and other deep neural network architectures
Correcting imbalances in neural networks

If a human is shown the image in figure 11.1, they can instantly recognize the objects in it, categorizing them as a bird, a plane, and Superman. In image classification, we want to impart this capability to computers—the ability to recognize objects in an image and classify them into one or more known and predetermined categories. Apart from identifying the object categories, we can also identify the location of the objects in the image. An object’s location can be described by a bounding box: a rectangle whose sides are parallel to coordinate axes. A bounding box is typically specified by four parameters: [(xtl, ytl),(xbr, ybr)], where (xtl, ytl) are the xy coordinates of the top-left corner and (xbr, ybr) are the xy coordinates of the bottom-right corner of the bounding

11.1 CNNs for image classification: LeNet

11.1.1 PyTorch- Implementing LeNet for image classification on MNIST

11.2 Toward deeper neural networks

11.2.1 VGG (Visual Geometry Group) Net

11.2.2 Inception: Network-in-network paradigm

11.2.3 ResNet: Why stacking layers to add depth does not scale

11.2.4 PyTorch Lightning

11.3 Object detection: A brief history

11.3.1 R-CNN

11.3.2 Fast R-CNN

11.3.3 Faster R-CNN

11.4 Faster R-CNN: A deep dive

11.4.1 Convolutional backbone

11.4.2 Region proposal network

11.4.3 Fast R-CNN

11.4.4 Training the Faster R-CNN

11.4.5 Other object-detection paradigms

Summary