Recently, I encountered a situation where I need to perform text searches in a large set of data. Normally, I would do it in a
filter function, but that would be slow when the list is too long. There’s a data structure for this situation. Meet the Trie!
Trie (pronounced try) is a tree-like data structure that’s created for efficient text searches.
Trie takes all words and rearranges them in a tree hierarchy. For example, the list of words
['abet', 'abode', 'abort'] will be transformed into a structure like this:
---------------a|b/ \e o| / \t d r| |e t
After that, if we want to search for a word beginning with ‘abo’, we can skip the branch under e in the third level.
In this post, I’m going to walk you through the details of implementing a Trie. There’ll be a lot of code. It may be intimidating and boring as hell, but I’ll add comments at each key step.
First, we need a method to add a new word into a Trie:
We also need a method to remove a word from a Trie.
Before I tried to implement Trie, I’d read a lot of tutorials on this topic. Almost none of these tutorials implements full search functionality. For example, this tutorial on GeeksForGeeks implements a search method that only checks if a word is in a Trie.
When we search for a word, we need to know all the related words as we type in. Here’s how to achieve this:
The implementation I just showed you is not optimal. To make it efficient, we still need to do a lot of work. For example, we can store the characters in a Binary Search Tree, so that we can search them more efficiently.
Ideally, the time complexity of a Trie search would be
M * log N, where
log N is the time complexity of a binary search,
M is the length of the input string.
Another limitation of a Trie is that building a Trie consumes a lot of memory since a Trie needs to maintain a lot of reference between nodes. If the task at hand is memory sensitive, we can use Ternary Search Tree to do a better job.
Check out the complete code here