chapter ten

10 Visual embeddings

This chapter is written by Ratnesh Kumar ...

Learning goals from this chapter:

Understand what is avisual embeddingin a world where we are interested in measuring similarity between images.
Learn how to express similarity between images via loss functions.
Learn techniques to train deep neural networks (CNN) to achieve a desired embedding function with high accuracy.
Learn how to use visual embeddings in real-world applications like Face Recognition and Image Search.

Obtaining meaningful relationships between images is a vital building block for many applications that touch our lives everyday, like face recognition and image search algorithms for example. To tackle such problems, we need to build an algorithm that can extract relevant features from the images and subsequently compare them using their corresponding features.

In the previous chapters, we learned that we can use convolutional neural networks (CNNs) to extract meaningful features for an image. This chapter will use our understanding of CNNs to train (jointly) a visual embedding layer. In this chapter’s context, visual embedding refers to the last fully-connected layer (prior to a loss layer) appended to a CNN. Joint training refers to training both the embedding-layer and the CNN parameters jointly.

This chapter explores the nuts and bolts for training and using visual embeddings for large scale image based query-retrieval systems. See Figure 10.1. for example applications of visual embeddings.

10.1 Applications of visual embeddings

10.1.1 Face recognition (FR)

10.1.2 Image recommendation systems

10.1.3 Object re-identification

10.2 Learning Embedding

10.3 Loss functions

10.3.1 Problem Setup and Formalization

10.3.2 Cross entropy loss

10.3.3 Contrastive Loss

10.3.4 Triplet Loss

10.4 Mining informative data

10.4.1 Dataloader

10.4.2 Informative data mining: Finding useful triplets

10.4.3 Batch All (BA)

10.4.4 Batch Hard (BH)

10.4.5 Batch Weighted (BW)

10.4.6 Batch sample (BS)

10.5 Project: Train an embedding network

10.5.1 Task: Fashion - get me items similar to this

10.5.2 Task 2: Vehicle re-identification

10.5.3 Implementation

10.6 Testing a trained model

10.6.1 Task 1: In-shop retrieval

10.6.2 Task 2: Vehicle Re-identification

10.7 Bonus: pushing the boundaries of current accuracy

10.8 Chapter summary and takeaways