Chapter 2.    Introduction to Information Search

Chapter 3 from Getting Started with Natural Language Processing by Ekaterina Kochmar

This chapter covers:

  • Implementation of your own information retrieval algorithm
  • A number of useful NLP concepts, including stemming and stopwords removal
  • Assessing importance of different bits of information in search
  • Evaluating the relevance of the documents to the information need

This chapter will focus on algorithms for information search, which also has a more technical name – information retrieval. It will explain the steps in the search algorithm from beginning to end, and by the end of this chapter you will be able to implement your own search algorithm.

You might have come across the term Information Retrieval in the context of search engines: for example, Google famously started its business by providing a powerful search algorithm that kept improving over time. The search for information, however, is a basic need that you may face not only in the context of searching online: for instance, every time you search for the files on your computer, you also perform sort of information retrieval. In fact, the task predates digital era: before computers and Internet became a commodity, one had to manually wade through paper copies of encyclopedias, books, documents, files and so on. Thanks to the technology, the algorithms these days help you do many of these tasks automatically.

3.1       Understanding the task

3.2       Processing the data further

3.3       Information weighing

3.4       Practical use of the search algorithm

3.5       Summary