10 Visual embeddings
This chapter is written by Ratnesh Kumar ...
Learning goals from this chapter:
- Understand what is avisual embeddingin a world where we are interested in measuring similarity between images.
- Learn how to express similarity between images via loss functions.
- Learn techniques to train deep neural networks (CNN) to achieve a desired embedding function with high accuracy.
- Learn how to use visual embeddings in real-world applications like Face Recognition and Image Search.
Obtaining meaningful relationships between images is a vital building block for many applications that touch our lives everyday, like face recognition and image search algorithms for example. To tackle such problems, we need to build an algorithm that can extract relevant features from the images and subsequently compare them using their corresponding features.
In the previous chapters, we learned that we can use convolutional neural networks (CNNs) to extract meaningful features for an image. This chapter will use our understanding of CNNs to train (jointly) a visual embedding layer. In this chapter’s context, visual embedding refers to the last fully-connected layer (prior to a loss layer) appended to a CNN. Joint training refers to training both the embedding-layer and the CNN parameters jointly.
This chapter explores the nuts and bolts for training and using visual embeddings for large scale image based query-retrieval systems. See Figure 10.1. for example applications of visual embeddings.