14 Question Answering and the Search Frontier

This chapter covers

Building a question-answering application
Curating a question-answering dataset for training
Fine-tuning a transformer model
Blending retrieval strategies by integrating deep-learning-based NLP with Solr
The future of AI-powered search and emerging search paradigms

With the basics of semantic search with transformers well understood from Chapter 13, we’re now ready to attempt one of the hardest problems in search: Question Answering. This problem, while a lofty goal, was chosen for several key reasons: it will (a) help you better understand the transformers tooling and ecosystem, (b) teach fine-tuning of large language models to a specific task, and (c) merge the Solr search engine and advanced natural language techniques together to produce a complete solution.

With our question-answering application in hand, we will then touch on how else search is evolving and what to expect in the coming years.

14.1 Question answering overview

In this section we will introduce the question-answering problem space and provide an overview of the retriever-reader pattern for implementing question answering.

Traditional search returns lists of documents or pages in response to a query, but often people may just be looking for a quick answer to their question versus wanting to spend time reviewing the underlying documents.

14.1.1 How a question-answering model works

14.1.2 The retriever-reader pattern

14.2 Constructing a question-answering training dataset

14 Question Answering and the Search Frontier

This chapter covers

14.1 Question answering overview

14.1.1 How a question-answering model works

14.1.2 The retriever-reader pattern

14.2 Constructing a question-answering training dataset

14.2.1 Using guesses from the pre-trained model for human-in-the-loop labelling

14.2.2 Converting the labeled data into the SQuAd data format

14.3 Fine-tuning the question-answering model

14.3.1 Tokenizing and shaping our labeled data

14.3.2 Configuring the RobertaForQuestionAnswering trainer

14.3.3 Performing training and evaluating loss

14.3.4 Hold-out validation and confirmation

14.4 Building the reader with the new fine-tuned model

14.5 Incorporating the retriever: using the Question-Answering model with Solr

14.6.1 Multimodal search