5 What’s up with you, Directed Acyclic Word Graph?
This chapter covers
- Several techniques for storing a large list of words and searching it by prefix
- Costs and benefits of hash sets, sorted lists, prefix trees (tries) and word graphs (DAWGs)
- Algorithms for building tries and DAWGs when given a word list in sorted order
- The connection between DAWGs and automata theory
My day improved only slightly after the disastrous first interview I described in the preface. The second interviewer asked me a much better question: given the seven letters on your rack in a turn of Scrabble, what algorithm finds all the “bingos”? That is, how do you list the legal seven letter words made up of exactly those letters? The function should take as its input a string such as “NISATEV” and produce the sequence “NAIVEST”, “NATIVES” and “VAINEST”.
It's not a super hard problem and I was quite pleased with my solution, but I struggled with the follow-up questions about harder word list searching problems. I knew that a prefix tree – a “trie” -- was a suitable data structure for word lists but was fuzzy on the details, and I admitted confusion when the interviewer asked me if I knew what a “dog” was; it turns out he meant a Directed Acyclic Word Graph, DAWG. That was my second NO HIRE of the day.