A Trie is a tree like data structure used for storing words in a way which facilitates fast look-ups. They prove very useful for handling dictionaries. In Part 1 we defined the trie data structure and started outlining the code for it. In this articles we are going to take a look at the basic operation we can do on a trie: Insert, Query and Remove. And in Part 2 we take a look at how we would implement basic operations on a trie.
We recommend that you go trough the first two parts before this one if you are new to tries. Now, let’s go trough a few of the tricks we can do with a trie.
7. Prefix Counter
Having the isFinal
member helps, but it doesn’t offer enough functionalities. If we were to count the number of unique words in our dictionary, how would you do it ? A short answer : a new TrieNode member calledprefixCount
.
class TrieNode{
var children: [Character:TrieNode] = [:]
var isFinal: Bool = false
var prefixCount: Int = 0
init() {
}
//...
According to the previous graphical representation, the insert
and remove
functions suffer changes.
When inserting a word, a simple rule would be to increment prefixCount
for each node of the way, including the root
.
And the implementation doesn’t change much:
func insert(_ word: String){
insert(characters: Array(word.characters))
}
func insert(characters: [Character]) {
var node = root
node.prefixCount += 1
for character in characters {
node = node.getOrCreateChildFor(character)
node.prefixCount += 1
}
node.isFinal = true //
}
The query
function doesn’t need modifications, but that’s not the case for remove
.
Having the sequence of nodes for a word given by getNodeSequence
, we need to decrement the prefixCount
value for each node. After we do that, we also do the check prefixCount == 0
. If it’s true, it means that there are no words in our dictionary that reach that node. It’s safe to remove the entire sequence of nodes from now on. Step by step:
func remove(_ word: String){
remove(characters: Array(word.characters))
}
func remove(characters: [Character]) {
var charactersNodes:[(Character,TrieNode)] = getNodeSequence(characters: characters)
if charactersNodes.count != characters.count + 1 {
return
}
for i in 0..<charactersNodes.count {
charactersNodes[i].1.prefixCount -= 1
if charactersNodes[i].1.prefixCount == 0 {
if i != 0 {
charactersNodes[i - 1].1.children.removeValue(forKey: charactersNodes[i].0)
return // we completely deleted the reference to an entire sequence of nodes
// swift will delete those nodes from memory
}
}
}
charactersNodes[charactersNodes.count - 1].1.isFinal = false
}
8. Unique Words
Mentioned in the previous chapter, the implementation consists in returning the prefixCount
member of the root
node. How simple is that ?
func uniqueWords() -> Int {
return root.prefixCount
}
9. Words Same Prefix
Given a prefix, the function returns how many words in our dictionary have it in common. Take the Trie from chapter 7. If we were to call the function for the prefixes “t”, “to”, the function would return 2 because both “to” and “tour” share those prefixes. Same logic goes for “r”, “ro”.
Every word is a prefix of it’s own; therefore, calling the function with one of the 4 words in our dictionary( roll, round, to, tour) will return 1. 1 is also returned when called for the prefix “roun” for example.
Here’s how we do it:
We iterate the word character by character. If the corresponding TrieNode doesn’t exist, we return 0; otherwise we return the value of prefixCount
from the last TrieNode.
func wordsSamePrefix(_ word: String) -> Int {
return wordsSamePrefix(characters: Array(word.characters))
}
func wordsSamePrefix(characters: [Character]) -> Int {
var node : TrieNode? = root
for character in characters {
node = node?.children[character]
if node == nil {
return 0
}
}
return node!.prefixCount
}
Notice how similiar the function is to query
. They both use the last TrieNode of the sequence. In order to avoid repeating code, we might just create a function to return that node, if it exists.
func getNodeFor(_ characters: [Character]) -> TrieNode?
var node: TrieNode? = root
for character in characters {
node = node?.children[character]
}
return node
}
And the query
and wordsSamePrefix
functions will look cleaner:
func query(_ word: String) -> Bool {
return query2(characters: Array(word.characters))
}
func query2(characters: [Character]) -> Bool {
let node = getNodeFor(characters)
if node == nil {
return false
}
return node!.isFinal
}
func wordsSamePrefix(_ word: String) -> Int {
return wordsSamePrefix(characters: Array(word.characters))
}
func wordsSamePrefix(characters: [Character]) -> Int {
let node = getNodeFor(characters)
if node == nil {
return 0
}
return node!.prefixCount
}
10. Remove Words Same Prefix
The function will remove all words having some prefix. The only diference from this and the remove
function is that we decrease the prefixCount
member of all TrieNodes
with the prefixCount
of the last character TrieNode
.
To make the example clear, consider adding to our previous Trie the word tea aswell.
Now, let’s say I want to remove all the words with the prefix “to”. That means a total of 2 words: to, tour.
When we get to the TrieNode for the letter ‘o’, we delete the reference from ‘t’ to it. For root, prefixCount = 5
and for the letter ‘t’, prefixCount = 3
which, should in fact be 3 and 1 respectively. It means that for these TrieNodes the prefixCount
member should be decreased by 2, the value from the letter ‘o’.
I just wanted to thank you for explaining all these rules and exceptions. It is pretty clear for me to comprehend the issue and its solution.