11 Autocomplete/typeahead
This chapter covers
- Comparing autocomplete with search
- Separating data collection and processing from querying
- Processing a continuous data stream
- Dividing a large aggregation pipeline into stages to reduce storage costs
- Employing the byproducts of data processing pipelines for other purposes
We wish to design an autocomplete system. Autocomplete is a useful question to test a candidate’s ability to design a distributed system that continuously ingests and processes large amounts of data into a small (few MBs) data structure that users can query for a specific purpose. An autocomplete system obtains its data from strings submitted by up to billions of users and then processes this data into a weighted trie. When a user types in a string, the weighted trie provides them with autocomplete suggestions. We can also add personalization and machine learning elements to our autocomplete system.
11.1 Possible uses of autocomplete
We first discuss and clarify the intended use cases of this system to ensure we determine the appropriate requirements. Possible uses of autocomplete include: