6 Computer vision: Object recognition

 

This chapter covers

  • Vectorizing images into quantitative features for ML
  • Using pixel values as features
  • Extracting edge information from images
  • Fine-tuning deep learning models to learn optimal image representations

Continuing our journey through dealing with unstructured data leads us to our image case study. Just as it was an issue with our NLP case study, the big question of this chapter is, how do we represent images in a machine-readable format? Throughout this chapter, we will take a look at ways to construct, extract, and learn feature representations of images for the purpose of solving an object recognition problem.

Object recognition simply means we are going to work with labeled images, where each image contains a single object, and the purpose of the model is to classify the image as a category that specifies what object is in the image. Object recognition is considered a relatively simple computer vision problem, as we don’t have to worry about finding the object or objects within an image using bounding boxes, nor do we have to do anything beyond pure classification into (usually) mutually exclusive categories. Let’s jump right into taking a look at the dataset for this case study—the CIFAR-10 dataset.

6.1 The CIFAR-10 dataset

6.1.1 The problem statement and defining success

6.2 Feature construction: Pixels as features

6.3 Feature extraction: Histogram of oriented gradients

6.3.1 Optimizing dimension reduction with PCA

6.4 Feature learning with VGG-11

6.4.1 Using a pretrained VGG-11 as a feature extractor

6.4.2 Fine-tuning VGG-11

6.4.3 Using fine-tuned VGG-11 features with logistic regression

6.5 Image vectorization recap

6.6 Answers to exercises