11 Deep Convolutional Neural Network Architectures for Image Classification and Object Detection
11.1 Introduction
Figure 11.1: Is it a bird? Is it a plane? Is it superman?
If a human being is shown the image in Figure 11.1 , (s)he can instantly recognize the objects in it, categorizing them as bird, plane superman. In image classification we want to impart this capability to computers - the ability to recognize objects in an image and classify them into one or more known and pre-determined categories. Apart from identifying the object categories, we can also identify the location of the objects in the image. An object’s location can be described by a bounding box, a rectangle whose sides are parallel to coordinate axes. A bounding box is typically specified by 4 parameters: [(xtl,ytl),(xbr,ybr)], where (xtl,ytl) are the xy coordinates of the top-left corner and (xbr,ybr) are the xy coordinates of the bottom right corner of the bounding box. The problem of identifying and categorizing the objects present in the image is called image classification while if, in addition, we also want to identify their location in the image it is referred to as object detection.