Working with the Trie data structure: Part 4

A Trie is a tree like data structure used for storing words in a way which facilitates fast look-ups. They prove very useful for handling dictionaries. In Part 1 we defined the trie data structure and started outlining the code for it. In this articles we are going to take a look at the basic operation we can do on a trie: Insert, Query and Remove, in Part 2 we took a look at how we would implement basic operations on a trie and in Part 3 we started to take a deeper look at the tricks a trie could do.

In this part we are going to see how to use a trie to get keep a sorted list of words and how we would go about generalizing  the structure so that it will work with any kind of alphabet not just the letters from a to z.

11. K-th word

Returns the k-th word in our dictionary (k>=0 && k < uniqueWords).

Bellow is a recursive implementation which should be called with one parameter:atIndex(the k value previously mentioned).

   func kthword(atIndex index: Int, from node: TrieNode? = nil, characters: [Character] = []) -> String? {
       let node = node ?? root

       if index >= self.uniqueWords() {
         return nil
       }

       var alphabetString = "abcdefghijklmnopqrstuvxyz"
       var alphabetSequence = Array(alphabetString.characters)

       if index == 0 && node.isFinal {
           return String(characters)
       }

       var skipped = 0
       if node.isFinal {
         skipped += 1
       }

       for char in alphabetSequence {
           if let child = node.children[char] {
               if skipped + child.prefixCount >  index {
                   // find the word in the current child
                   return kthword(atIndex: index - skipped, from: child, characters: characters + [char])
               } else {
                   // skip words 
                   skipped += child.prefixCount
               }
           }

       }

       return nil
   }

The line let node = node ?? root uses something called nil- coalescing. Directly from apple developer:

The nil-coalescing operator (a ?? b) unwraps an optional a if it contains a value, or returns a default value b if a is nil. The expression a is always of an optional type. The expression b must match the type that is stored inside a.

Based on the index of the sequence to be followed from that node, we call the function recursively with the proper child and indexas follows:

We iterate through its children. If a child’s prefixCount is less or equal than our index value, we add it to skipped, which indicates how many words we are skipping with that child.

When we get to a node and skipped + child.prefixCount > index it means that our kth word is within that child’s reach and we call the function recursively with that child and index = index - skipped. The index is adjusted so that we refer to the sequences starting form that child.

Our goal is to find a node with index == 0 && node.isFinal.

It may not look so efficient to iterate the entire alphabet at each step(for char in alphabetSequence{ }), but it’s either that or sorting the dictionary keys:

Bellow is a representation of kthWord(atIndex: 2):

kthWord2

and kthWord(atIndex: 4)

kthWord4

12.Validate Character Sequence

So far, we’ve assumed that all our input data was valid. We’ve assumed that all Strings given as parameters for querry,insert, etc. contained lowercase english characters, but what if they didn’t ?

The only solution is to check that all Characters in the sequence belong to the given alphabet, one by one…

    func validate(characters: [Character]) -> Bool {
        var alphabetString = "abcdefghijklmnopqrstuvxyz"
        var alphabetSequence = Array(alphabetString.characters)

        return characters.reduce(true) { result, char in
            return result && alphabetSequence.contains(char)
        }
    }

We use validate as a precondition, indicated by guard:

func insert(_ word: String){
        insert(characters: Array(word.characters))
    }

    func insert(characters: [Character]) {
      guard validate(characters: characters) else {
          return
      }
      //...
    }


    func query(_ word: String) -> Bool {
        return query(characters: Array(word.characters))
    }

    func query(characters: [Character]) -> Bool {
      guard validate(characters: characters) else {
          return false
      }
	//...
    }

    func remove(_ word: String){
        remove(characters: Array(word.characters))
    }

    func remove(characters: [Character]) {
      guard validate(characters: characters) else {
          return
      }
	//...

Run example:

  let myTrie = Trie()

  myTrie.insert("to");
  myTrie.insert("tO"); // it won't insert the word, because "O" is UPPERCASE
  print(myTrie.query("tO")) //  false
  print(myTrie.query("to")) //  true

The complete code for the lowercase alphabet can be found here : http://pastebin.com/dmPdzzkj

13.Generics

We used so far the lowercase english alphabet for our Trie. It’s time to generalize the Alphabet.

protocol Alphabet {
    associatedtype AlphabetCharacter: Comparable, Hashable

    func allCharacters() -> [AlphabetCharacter]
}

The AlphabetCharacter needs to be Comparable for the kthWord function and Hashable because it will be used as a key in a dictionary.

lowercase english alphabet:

class LowercaseLetters: Alphabet {
    typealias AlphabetCharacter = Character

    func allCharacters() -> [AlphabetCharacter] {
        return Array("abcdefghijklmnopqrstuvwxyz".characters)
    }

    required init() {

    }
}

Bellow is the TrieNode for a generic T which respects the Alphabet protocol:

class TrieNode<T> where T:Alphabet {
    typealias character = T.AlphabetCharacter

    var children: [character:TrieNode<T>] = [:]
    var prefixCount: Int = 0
    var isFinal:Bool = false

    init() {

    }

    func createChildFor(_ character: character) -> TrieNode<T> {
        let node = TrieNode<T>()

        children[character] = node

        return node
    }

    func getOrCreateChildFor(_ character: character) -> TrieNode<T> {
        if let child = children[character] {
            return child
        } else {
            return createChildFor(character)
        }
    }
}

All Trie functionalities stay the same:

class Trie<T> where T:Alphabet {
    typealias character = T.AlphabetCharacter
    typealias Node = TrieNode<T>

    var root = Node()

    func validate(characters: [character]) -> Bool {
        let allCharacters = T().allCharacters()
        return characters.reduce(true) { result, char in
            return result && allCharacters.contains(char)
        }
    }

    func insert<U:CharacterSequence>(_ word: U) where U.SequenceCharacter == character {
        insert(characters: word.toSequence())
    }

    func insert(characters: [character]) {
        guard validate(characters: characters) else {
            return
        }

        var node = root
        node.prefixCount += 1
        for character in characters {
            node = node.getOrCreateChildFor(character)
            node.prefixCount += 1
        }
        node.isFinal = true //
    }
    //.....

Notice that the new insert function takes a generic U which conforms to the protocol CharacterSequence:

protocol CharacterSequence {
    associatedtype SequenceCharacter: Comparable, Hashable
    func toSequence() -> [SequenceCharacter]
    init(characters: [SequenceCharacter])
}

The reasons for making the SequenceCharacter conform to Comparable and Hashable are the same as for the alphabet.

The syntax where U.SequenceCharacter == character is a generic type check, which makes sure that SequenceCharacter == AlphabetCharacter.

All our non-generic functions took a String as parameter and we would call the functions with a Character array. To be able to use String that way with generics, we need to conform it to our protocol:

extension String: CharacterSequence {
    typealias SequenceCharacter = Character

    func toSequence() -> [SequenceCharacter] {
        return Array(characters)
    }

    init(characters: [SequenceCharacter]) {
        self.init(characters)
    }
}

The need of the constructor is caused by the following non-generic syntax of the kthWord function: return String(characters).

This is how the new function looks like :

func kthword<U:CharacterSequence>(atIndex index: Int, from node: Node? = nil, characters: [character] = []) -> U? where U.SequenceCharacter == character {
        let node = node ?? root

        if index == 0 && node.isFinal {
            return U(characters: characters)
        }

        var skipped = 0
        if node.isFinal {
          skipped += 1
        }

        for char in T().allCharacters() {
            if let child = node.children[char] {
                if skipped + child.prefixCount > index {
                    // find the word in the current child
                    return kthword(atIndex: index - skipped, from: child, characters: characters + [char])
                } else {
                    // skip words 
                    skipped += child.prefixCount
                }
            }

        }

        return nil
    }

Test Run:

let myTrie = Trie<LowercaseLetters>()
 
myTrie.insert("to");
myTrie.insert("tour");
myTrie.insert("tea");
myTrie.insert("roll");
myTrie.insert("round");
 
myTrie.remove("roLl")
print (myTrie.query("roll"))      // true
 
myTrie.removeWordsPrefix("r")
print (myTrie.query("roll"))      // false
print(myTrie.wordsSamePrefix("t"))  // 3
 
// current words in our dictionary :tea, to, tour
let word: String = myTrie.kthWord(atIndex: 2)!
print(word) // tour

The complete code can be found at: http://pastebin.com/Tm57evsn

How about having a Trie for Integers. Instead of a String, we will have Int and the alphabet will consist of all digits from 0…9. How hard can that be to implement ?

class DigitsAlphabet: Alphabet {
    typealias AlphabetCharacter = Int8

    func allCharacters() -> [AlphabetCharacter] {
        return Array(0...9)
    }

    required init() {

    }
}
extension Int: CharacterSequence {
     typealias SequenceCharacter = Int8

     func toSequence() -> [SequenceCharacter] {
         return String(self).characters.map { character in
             let asString = "\(character)"
             let asInt = Int(asString)!
             let asInt8 = Int8(asInt)

             return asInt8
         }
     }

     init(characters: [SequenceCharacter]) {
        var value = 0
        var p = 1

        for char in characters.reversed() {
            value += p * Int(char)

            p *= 10
        }

        self.init(value)
     }
 }

The constructor takes an array of Int8 and makes an Int out of it, while the toSequence()does the opposite by using a clerver mapping: the Int is converted to an String then each Character is mapped to an Int8.

Test Run:

let myTrie2 = Trie<DigitsAlphabet>()
myTrie2.insert(12);
myTrie2.insert(123);
myTrie2.insert(13);
myTrie2.insert(133);
myTrie2.insert(134);
myTrie2.insert(211);

myTrie2.remove(12)
print(myTrie2.query(12))  // false
print(myTrie2.query(123))  // true

myTrie2.removeWordsPrefix(12)
print(myTrie2.query(123))  // false
print(myTrie2.wordsSamePrefix(13))  // 3 // 13, 133, 134

14. Suggestions and Auto-Completion

When you are typing something and a list of words appear, it’s our Trie in play!

The function will take a CharacterSequence as prefix and will return [CharacterSequence] containing the suggested words.

     func suggestWords<U:CharacterSequence>(_ word: U) -> [U] where U.SequenceCharacter == character {
        return suggestWords(characters: word.toSequence())
    }



    func suggestWords<U:CharacterSequence>(characters: [character]) -> [U] where U.SequenceCharacter == character{
          guard validate(characters: characters) else {
              return []
          }

          let lastNodeCharacters = getNodeFor(characters: characters)

          if lastNodeCharacters == nil {
            return []
            
            //... to be continued
          
      }

If given “prefix” is not contained in the Trie, we will return an empty list. Otherwise, using a DFS, we get all nodes accessible from the prefix. We need them as tuples of the following form (node, corresponding character, prefixCount).

PrefixCount will dictate how to compute words.

Bellow is a standard DFS.

  func DFS(node: Node, char: character ) -> [(node: Node, char: character, counter: Int)] {

          let allCharacters = T().allCharacters()
          var discoveredNodes:[(node: Node, char: character, counter: Int)] = []

          var stackOfNodes = Stack<(node: Node,char: character, counter: Int)>()
          stackOfNodes.push((node: node, char: char, counter: node.prefixCount ))

          var nodeChar:(node: Node, char: character, counter: Int)

          while !stackOfNodes.isEmpty(){
            nodeChar = stackOfNodes.pop()
            discoveredNodes.append(nodeChar)

            for character in allCharacters {
                if let inter = nodeChar.node.children[character] {
                    stackOfNodes.push((node:inter, char:character, counter:inter.prefixCount))
                }
            }
          }

          return discoveredNodes
      }

Our main function will look like this:

func suggestWords<U:CharacterSequence>(characters: [character]) -> [U] where U.SequenceCharacter == character{
          guard validate(characters: characters) else {
              return []
          }

          let lastNodeCharacters = getNodeFor(characters: characters)

          if lastNodeCharacters == nil {
            return []
          }
          var discoveredNodes:[(node: Node, char: character, counter: Int)] = DFS(node: lastNodeCharacters!, char: characters[characters.count - 1])

          // discoveredNodes also includes the last character from characters parameter
          // it can easily be removed here, since we know it's the first discovered node
          discoveredNodes.remove(at: 0)

          return buildWords(characters: characters, discoveredNodes: &discoveredNodes)
      }

As you can see, buildWords function takes as parameters the prefix and the sequence of discovered nodes from DFS. Consider the following Trie:

TrieExampleSuggestWords

At a call of suggestWords("ro"), discoveredNodes will contain the following nodes in the order marked with red: NodesDFS

In the buildWords function we iterate discoveredNodes until we find a node for which isFinal is set to true. When such node is found, we iterate once again from the beginning, but stopping at our initial node. Each node found along the way, having counter > 0, is included in the current word and counter is decremented, in order to indicate the limited number of usages.

Here’s how the construction of the words “route” and “round” will go like: suggestRoute

suggestRound

Bellow you have the code:

  func buildSeq<U:CharacterSequence>(characters: [character], discoveredNodes: inout [(node: Node, char: character, counter: Int)]) -> [U] where U.SequenceCharacter == character {
      var res:[U] = []

      for i in 0..<discoveredNodes.count {
        if discoveredNodes[i].node.isFinal == true {
          var w:[character] = []
          // counter tells us which nodes characters are to be included in the sequence
          // if counter > 0 then the node's character belongs to the sequence and we decrement the counter
          // otherwise, the node was used in other sequences and is now "used up"
          print("building seq starting with:",characters)
          for j in 0...i{
              if discoveredNodes[j].counter > 0 {
                print(discoveredNodes[j].char)
                w.append(discoveredNodes[j].char)

                discoveredNodes[j].counter -= 1
            }
          }
          w = characters + w

          res.append(U(characters: w))
        }
      }

      return res
    }

I left 2 prints there to make clear at runtime how the algorithm works.

Test Run:

print (myTrie.suggestWords("ro"))

will print:

building seq starting with: ["r", "o"]
u
t
e
building seq starting with: ["r", "o"]
u
n
d
building seq starting with: ["r", "o"]
l
l
["route", "round", "roll"]

That’s it :)

Remark: The algorithm does not return the prefix itself if it’s a valid word in our dictionary

Full code here.

Now, that we reached the ending, I hope the article made you see the importance and usefulness of the Trie data structure .

Practice, practice, practice :)

1 Star2 Stars3 Stars4 Stars5 Stars (9 votes, average: 4.44 out of 5)
Loading...

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Subscribe
We send about one email per week with our latest tutorials and updates
Never display this again :)