4 Semantic Search from Scratch

 

This chapter covers

  • Building a hotel search engine from scratch: Travelle
  • The hotel data and reviews used to build the search engine
  • Converting data into Embeddings using Sentence Transformers
  • Using distance metrics to rank results using Bi-Encoders
  • Introduction to FAISS and the World of Vector Databases

In Chapter 3, we introduced the concept of semantic search and highlighted its advantages over traditional keyword-based search. While keyword search focuses on matching specific words, semantic search goes beyond by understanding the meaning and context of the text, enabling more accurate and relevant results.

We found when a user searches for "furry animal" using a keyword search, the results would only include documents that explicitly contain the words "furry" and "animal." However, many relevant documents that mention specific furry animals like "dogs," "cats," or "rabbits" might be missed. In contrast, semantic search understands that "dog" is a furry animal and would include documents mentioning "dog" in the search results, even if the words "furry" and "animal" are not explicitly present.

In this chapter, our goal is to build upon our previous learning and dive into the practical application of Semantic Search: Build A Hotel Search Engine from scratch (like Expedia), which we will call Travelle, built by me (you can see this engine in action here: https://travelle.traversaal.ai).

4.1 Loading the data for semantic search

4.2 Generate Embeddings from Hotel Reviews

4.2.1 Selecting the right Encoder Model

4.3 Similarity scores using Cross Encoders and Bi-Encoders

4.4 Introduction to FAISS and Vector Databases

4.5 Putting it all together: Travelle in action

4.6 Summary