11 Autocomplete/typeahead

 

This chapter covers

  • Comparing autocomplete with search
  • Separating data collection and processing from querying
  • Processing a continuous data stream
  • Dividing a large aggregation pipeline into stages to reduce storage costs
  • Employing the byproducts of data processing pipelines for other purposes

We wish to design an autocomplete system. Autocomplete is a useful question to test a candidate’s ability to design a distributed system that continuously ingests and processes large amounts of data into a small (few MBs) data structure that users can query for a specific purpose. An autocomplete system obtains its data from strings submitted by up to billions of users and then processes this data into a weighted trie. When a user types in a string, the weighted trie provides them with autocomplete suggestions. We can also add personalization and machine learning elements to our autocomplete system.

11.1 Possible uses of autocomplete

We first discuss and clarify the intended use cases of this system to ensure we determine the appropriate requirements. Possible uses of autocomplete include:

11.2 Search vs. autocomplete

 
 
 
 

11.3 Functional requirements

 
 
 
 

11.3.1 Scope of our autocomplete service

 

11.3.2 Some UX details

 
 
 

11.3.3 Considering search history

 
 

11.3.4 Content moderation and fairness

 
 
 
 

11.4 Non-functional requirements

 
 

11.5 Planning the high-level architecture

 
 
 
 

11.6 Weighted trie approach and initial high-level architecture

 

11.7 Detailed implementation

 
 

11.7.1 Each step should be an independent task

 
 
 
 

11.7.2 Fetch relevant logs from Elasticsearch to HDFS

 
 

11.7.3 Split the search strings into words and other simple operations

 
 

11.7.4 Filter out inappropriate words

 
 
 
 

11.7.5 Fuzzy matching and spelling correction

 

11.7.6 Count the words

 
 

11.7.7 Filter for appropriate words

 
 
 
 

11.7.8 Managing new popular unknown words

 
 
 

11.7.9 Generate and deliver the weighted trie

 
 
 

11.8 Sampling approach

 
 
sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest