6 Trie, Radix Trie: Efficient Strings Search

This chapter covers

Understanding why working with strings is different
Introducing Trie[1] for efficient string search and indexing
Introducing Radix Tree[2] as a memory-efficient evolution of Trie
Using prefix trees to solve string-related problems
Leveraging Tries to implement an efficient spell-checker

How many times have you sent a text, an email or twitted in a hurry, just to realize, a second later, that you had a typo? For me, it was too many times! Lately, however, we started to have a precious ally by default on our side in email clients, and browsers in general: spell-checkers! If you’d like to know more about how they work and how to implement them efficiently, this chapter is the right place to start.

In chapter 3 we have described balanced trees, which offer the best compromise when it comes to containers, and are ideal to efficiently store dynamically-changing data on which we need to perform frequent searches: in appendix C we have discussed and compared the options we have for containers providing fast lookup, fast insertion or fast removal, and trees offer the best tradeoff between all the operations.

Balanced trees, in particular, guarantee logarithmic running time in the worst-case for all the main operations.

In the general case, when we don’t know anything about the data we need to store and (later) search, this is really the best we can hope.

6.1 Spell Check

6.1.1 A Prncess, a Damon and an Elf Walkz into a Bar

6.1.2 Compression is the Key

6.1.3 Description and API

6.2 Trie

6.2.1 Why is it Better Again?

6.2.2 Search

6.2.3 Insert

6.2.4 Remove

6.2.5 Longest Prefix

6.2.6 Keys Matching a Prefix

6.2.7 When Should We Use Tries?

6.3 Radix Tries

6.3.1 Nodes and Edges

6.3.2 Search

6.3.3 Insert

6.3.4 Remove

6.3.5 Longest Common Prefix

6.3.6 Keys Starting with a Prefix

6.4 Applications

6.4.1 Spell Checker

6.4.2 Strings Similarity

6.4.3 String Sorting

6.4.4 T9

6.4.5 Autocomplete

6.5 Summary