Chapter 6. Document embeddings for rankings and recommendations
This chapter covers
- Generating document embeddings using paragraph vectors
- Using paragraph vectors for ranking
- Retrieving related content
- Improving related-content retrieval with paragraph vectors
In the previous chapter, I introduced you to neural information retrieval models by building a ranking function based on averaged word embeddings. You averaged word embeddings generated by word2vec to obtain a document embedding, a dense representation of a sequence of words, that demonstrated high precision in ranking documents according to user intent.
The drawback of common retrieval models such as Vector Space Model with TF-IDF and BM25, however, is that they only look at single terms when ranking documents. This approach can lead to suboptimal results because the context information of those terms is discarded. With this drawback in mind, let’s see how you can generate document embeddings that look not just at single words, but at the whole text fragments surrounding those words. A vector representation created from these context-enhanced document embeddings will carry as much semantic information as possible, thus improving the ranking function’s precision even more.