Working with the Trie data structure: Part 1

Consider the need to use a dictionary. You will have to insert , look-up , remove words to and from it. What data structure would you use ? Would you use a hash table ? Would that be efficient in terms of memory? Perhaps not.

A Trie is a tree like data structure used for storing words in a way which facilitates fast look-ups. They prove very useful for handling dictionaries. What is special about them is that you can determine the longest common prefix between multiple words and even use it for autocompletion of words.

 

The concept behind tries is easy to understand and to implement. Suppose you have 4 words: roll, round, to, tour. Here is how their corresponding Trie looks:

TRIE

This article will walk you through a series of 14 steps. Completing them will guarantee you a basic understanding of tries.

 

Part 1:

1. Tree Data Structure

2. Example of a Trie

3. TrieNode and Trie Classes

 

Part 2:

4. Insert

5. Look-up

6. Remove

 

Part 3:

7. Prefix Counter

8. Unique Words

9. Words Same Prefix

10. Remove Words Same Prefix

 

Part 4:

11. kth Word

12. Validate Character Sequence

13. Generics

14. Suggestions and Auto-Completion

 

 

1. Tree data structure

Tree data structures are inspired from nature. A tree data structure has a root, branches, and leaves. The difference between a tree in nature and a tree in computer science is that a tree data structure has its root at the top and its leaves on the bottom.

Take the folder hierarchy for example:

TreeYou can follow a path from the root to any directory. That path will uniquely identify that subdirectory (and all the files in it).

Terminology:

root – starting point of the Tree

edge – the connection between one node and another

child – node directly connected to another node when going one level down

parent – converse notion of a child

leaf – node with no children

path – A sequence of nodes and edges connecting a node with a descendant.

2. Example of a Trie

To begin with, let’s represent a Trie containing lowercase words. Each TrieNode will have a lowercase character bound to it (except the root Node) and a limited number of children, given by the length of the alphabet (the english alphabet has 26 letters) .

The following is a representation of a Trie containing the words : round, roll, to, tour.

TRIE

isFinal indicates whether a TrieNode corresponds to the last character of a word. Notice that the words “roll” and “round” have the “ro” prefix in common, while the words “to” and “tour” have in common “to”.

3. TrieNode and Trie Classes

We are going to use two classes to describe the trie, one for the nodes and one for the trie. The node class will be called TrieNode which has two properties: children bounds characters to children TrieNodes and isFinal. 

class TrieNode {
    var children: [Character:TrieNode] = [:]
    var isFinal: Bool = false

    ...
}

The Trie class will have only one stored member, the root node from the trie.

class Trie {
    var root = TrieNode()
    
    ...
}

 

That was it for now! In the next part we are going to take a look at how we would implement basic operations on a trie.

 

Thank you for reading! If you enjoyed this article please take a moment and share it with your friends.

  3 comments for “Working with the Trie data structure: Part 1

  1. Sara Ahmad
    February 15, 2017 at 8:52 pm

    Just want to say a huge thank you!!! You are doing a great job

  2. Pau
    May 17, 2017 at 1:09 pm

    Hello, just a question, why `isFinal` is mutable?

    • Robert Cazoni
      May 19, 2017 at 7:53 pm

      ‘isFinal’ is mutable because we will also delete nodes from the trie; you’ll see in the next part

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Subscribe
We send about one email per week with our latest tutorials and updates
Never display this again :)