A Trie is a tree like data structure used for storing words in a way which facilitates fast look-ups. They prove very useful for handling dictionaries. In Part 1 we defined the trie data structure and started outlining the code for it. In this articles we are going to take a look at the basic operation we can do on a trie: Insert, Query and Remove, in Part 2 we took a look at how we would implement basic operations on a trie and in Part 3 we started to take a deeper look at the tricks a trie could do.
In this part we are going to see how to use a trie to get keep a sorted list of words and how we would go about generalizing the structure so that it will work with any kind of alphabet not just the letters from a to z.
11. K-th word
Returns the k-th word in our dictionary (k>=0 && k < uniqueWords
).
Bellow is a recursive implementation which should be called with one parameter:atIndex
(the k value previously mentioned).
func kthword(atIndex index: Int, from node: TrieNode? = nil, characters: [Character] = []) -> String? {
let node = node ?? root
if index >= self.uniqueWords() {
return nil
}
var alphabetString = "abcdefghijklmnopqrstuvxyz"
var alphabetSequence = Array(alphabetString.characters)
if index == 0 && node.isFinal {
return String(characters)
}
var skipped = 0
if node.isFinal {
skipped += 1
}
for char in alphabetSequence {
if let child = node.children[char] {
if skipped + child.prefixCount > index {
// find the word in the current child
return kthword(atIndex: index - skipped, from: child, characters: characters + [char])
} else {
// skip words
skipped += child.prefixCount
}
}
}
return nil
}
The line let node = node ?? root
uses something called nil- coalescing. Directly from apple developer:
The nil-coalescing operator (a ?? b) unwraps an optional a if it contains a value, or returns a default value b if a is nil. The expression a is always of an optional type. The expression b must match the type that is stored inside a.
Based on the index
of the sequence to be followed from that node
, we call the function recursively with the proper child
and index
as follows:
We iterate through its children. If a child’s prefixCount
is less or equal than our index
value, we add it to skipped
, which indicates how many words we are skipping with that child.
When we get to a node and skipped + child.prefixCount > index
it means that our kth word is within that child’s reach and we call the function recursively with that child and index = index - skipped
. The index is adjusted so that we refer to the sequences starting form that child
.
Our goal is to find a node with index == 0 && node.isFinal
.
It may not look so efficient to iterate the entire alphabet at each step(for char in alphabetSequence{ }
), but it’s either that or sorting the dictionary keys:
Bellow is a representation of kthWord(atIndex: 2)
:
and kthWord(atIndex: 4)
12.Validate Character Sequence
So far, we’ve assumed that all our input data was valid. We’ve assumed that all Strings given as parameters for querry
,insert
, etc. contained lowercase english characters, but what if they didn’t ?
The only solution is to check that all Characters in the sequence belong to the given alphabet, one by one…
func validate(characters: [Character]) -> Bool {
var alphabetString = "abcdefghijklmnopqrstuvxyz"
var alphabetSequence = Array(alphabetString.characters)
return characters.reduce(true) { result, char in
return result && alphabetSequence.contains(char)
}
}
We use validate
as a precondition, indicated by guard
:
func insert(_ word: String){
insert(characters: Array(word.characters))
}
func insert(characters: [Character]) {
guard validate(characters: characters) else {
return
}
//...
}
func query(_ word: String) -> Bool {
return query(characters: Array(word.characters))
}
func query(characters: [Character]) -> Bool {
guard validate(characters: characters) else {
return false
}
//...
}
func remove(_ word: String){
remove(characters: Array(word.characters))
}
func remove(characters: [Character]) {
guard validate(characters: characters) else {
return
}
//...
Run example:
let myTrie = Trie()
myTrie.insert("to");
myTrie.insert("tO"); // it won't insert the word, because "O" is UPPERCASE
print(myTrie.query("tO")) // false
print(myTrie.query("to")) // true
The complete code for the lowercase alphabet can be found here : http://pastebin.com/dmPdzzkj
13.Generics
We used so far the lowercase english alphabet for our Trie. It’s time to generalize the Alphabet.
protocol Alphabet {
associatedtype AlphabetCharacter: Comparable, Hashable
func allCharacters() -> [AlphabetCharacter]
}
The AlphabetCharacter
needs to be Comparable for the kthWord
function and Hashable because it will be used as a key in a dictionary.
lowercase english alphabet:
class LowercaseLetters: Alphabet {
typealias AlphabetCharacter = Character
func allCharacters() -> [AlphabetCharacter] {
return Array("abcdefghijklmnopqrstuvwxyz".characters)
}
required init() {
}
}
Bellow is the TrieNode for a generic T which respects the Alphabet protocol:
class TrieNode<T> where T:Alphabet {
typealias character = T.AlphabetCharacter
var children: [character:TrieNode<T>] = [:]
var prefixCount: Int = 0
var isFinal:Bool = false
init() {
}
func createChildFor(_ character: character) -> TrieNode<T> {
let node = TrieNode<T>()
children[character] = node
return node
}
func getOrCreateChildFor(_ character: character) -> TrieNode<T> {
if let child = children[character] {
return child
} else {
return createChildFor(character)
}
}
}
All Trie functionalities stay the same:
class Trie<T> where T:Alphabet {
typealias character = T.AlphabetCharacter
typealias Node = TrieNode<T>
var root = Node()
func validate(characters: [character]) -> Bool {
let allCharacters = T().allCharacters()
return characters.reduce(true) { result, char in
return result && allCharacters.contains(char)
}
}
func insert<U:CharacterSequence>(_ word: U) where U.SequenceCharacter == character {
insert(characters: word.toSequence())
}
func insert(characters: [character]) {
guard validate(characters: characters) else {
return
}
var node = root
node.prefixCount += 1
for character in characters {
node = node.getOrCreateChildFor(character)
node.prefixCount += 1
}
node.isFinal = true //
}
//.....
Notice that the new insert function takes a generic U which conforms to the protocol CharacterSequence:
protocol CharacterSequence {
associatedtype SequenceCharacter: Comparable, Hashable
func toSequence() -> [SequenceCharacter]
init(characters: [SequenceCharacter])
}
The reasons for making the SequenceCharacter
conform to Comparable
and Hashable
are the same as for the alphabet.
The syntax where U.SequenceCharacter == character
is a generic type check, which makes sure that SequenceCharacter == AlphabetCharacter
.
All our non-generic functions took a String
as parameter and we would call the functions with a Character
array. To be able to use String
that way with generics, we need to conform it to our protocol:
extension String: CharacterSequence {
typealias SequenceCharacter = Character
func toSequence() -> [SequenceCharacter] {
return Array(characters)
}
init(characters: [SequenceCharacter]) {
self.init(characters)
}
}
The need of the constructor is caused by the following non-generic syntax of the kthWord
function: return String(characters)
.
This is how the new function looks like :
func kthword<U:CharacterSequence>(atIndex index: Int, from node: Node? = nil, characters: [character] = []) -> U? where U.SequenceCharacter == character {
let node = node ?? root
if index == 0 && node.isFinal {
return U(characters: characters)
}
var skipped = 0
if node.isFinal {
skipped += 1
}
for char in T().allCharacters() {
if let child = node.children[char] {
if skipped + child.prefixCount > index {
// find the word in the current child
return kthword(atIndex: index - skipped, from: child, characters: characters + [char])
} else {
// skip words
skipped += child.prefixCount
}
}
}
return nil
}
Test Run:
let myTrie = Trie<LowercaseLetters>()
myTrie.insert("to");
myTrie.insert("tour");
myTrie.insert("tea");
myTrie.insert("roll");
myTrie.insert("round");
myTrie.remove("roLl")
print (myTrie.query("roll")) // true
myTrie.removeWordsPrefix("r")
print (myTrie.query("roll")) // false
print(myTrie.wordsSamePrefix("t")) // 3
// current words in our dictionary :tea, to, tour
let word: String = myTrie.kthWord(atIndex: 2)!
print(word) // tour
The complete code can be found at: http://pastebin.com/Tm57evsn
How about having a Trie for Integers. Instead of a String
, we will have Int
and the alphabet will consist of all digits from 0…9. How hard can that be to implement ?
class DigitsAlphabet: Alphabet {
typealias AlphabetCharacter = Int8
func allCharacters() -> [AlphabetCharacter] {
return Array(0...9)
}
required init() {
}
}
extension Int: CharacterSequence {
typealias SequenceCharacter = Int8
func toSequence() -> [SequenceCharacter] {
return String(self).characters.map { character in
let asString = "\(character)"
let asInt = Int(asString)!
let asInt8 = Int8(asInt)
return asInt8
}
}
init(characters: [SequenceCharacter]) {
var value = 0
var p = 1
for char in characters.reversed() {
value += p * Int(char)
p *= 10
}
self.init(value)
}
}
The constructor takes an array of Int8
and makes an Int
out of it, while the toSequence()
does the opposite by using a clerver mapping: the Int
is converted to an String
then each Character
is mapped to an Int8
.
Test Run:
let myTrie2 = Trie<DigitsAlphabet>()
myTrie2.insert(12);
myTrie2.insert(123);
myTrie2.insert(13);
myTrie2.insert(133);
myTrie2.insert(134);
myTrie2.insert(211);
myTrie2.remove(12)
print(myTrie2.query(12)) // false
print(myTrie2.query(123)) // true
myTrie2.removeWordsPrefix(12)
print(myTrie2.query(123)) // false
print(myTrie2.wordsSamePrefix(13)) // 3 // 13, 133, 134
14. Suggestions and Auto-Completion
When you are typing something and a list of words appear, it’s our Trie in play!
The function will take a CharacterSequence
as prefix and will return [CharacterSequence]
containing the suggested words.
func suggestWords<U:CharacterSequence>(_ word: U) -> [U] where U.SequenceCharacter == character {
return suggestWords(characters: word.toSequence())
}
func suggestWords<U:CharacterSequence>(characters: [character]) -> [U] where U.SequenceCharacter == character{
guard validate(characters: characters) else {
return []
}
let lastNodeCharacters = getNodeFor(characters: characters)
if lastNodeCharacters == nil {
return []
//... to be continued
}
If given “prefix” is not contained in the Trie, we will return an empty list. Otherwise, using a DFS, we get all nodes accessible from the prefix. We need them as tuples of the following form (node, corresponding character, prefixCount)
.
PrefixCount
will dictate how to compute words.
Bellow is a standard DFS.
func DFS(node: Node, char: character ) -> [(node: Node, char: character, counter: Int)] {
let allCharacters = T().allCharacters()
var discoveredNodes:[(node: Node, char: character, counter: Int)] = []
var stackOfNodes = Stack<(node: Node,char: character, counter: Int)>()
stackOfNodes.push((node: node, char: char, counter: node.prefixCount ))
var nodeChar:(node: Node, char: character, counter: Int)
while !stackOfNodes.isEmpty(){
nodeChar = stackOfNodes.pop()
discoveredNodes.append(nodeChar)
for character in allCharacters {
if let inter = nodeChar.node.children[character] {
stackOfNodes.push((node:inter, char:character, counter:inter.prefixCount))
}
}
}
return discoveredNodes
}
Our main function will look like this:
func suggestWords<U:CharacterSequence>(characters: [character]) -> [U] where U.SequenceCharacter == character{
guard validate(characters: characters) else {
return []
}
let lastNodeCharacters = getNodeFor(characters: characters)
if lastNodeCharacters == nil {
return []
}
var discoveredNodes:[(node: Node, char: character, counter: Int)] = DFS(node: lastNodeCharacters!, char: characters[characters.count - 1])
// discoveredNodes also includes the last character from characters parameter
// it can easily be removed here, since we know it's the first discovered node
discoveredNodes.remove(at: 0)
return buildWords(characters: characters, discoveredNodes: &discoveredNodes)
}
As you can see, buildWords
function takes as parameters the prefix and the sequence of discovered nodes from DFS. Consider the following Trie:
At a call of suggestWords("ro")
, discoveredNodes
will contain the following nodes in the order marked with red:
In the buildWords
function we iterate discoveredNodes
until we find a node for which isFinal
is set to true
. When such node is found, we iterate once again from the beginning, but stopping at our initial node. Each node found along the way, having counter > 0
, is included in the current word and counter
is decremented, in order to indicate the limited number of usages.
Here’s how the construction of the words “route” and “round” will go like:
Bellow you have the code:
func buildSeq<U:CharacterSequence>(characters: [character], discoveredNodes: inout [(node: Node, char: character, counter: Int)]) -> [U] where U.SequenceCharacter == character {
var res:[U] = []
for i in 0..<discoveredNodes.count {
if discoveredNodes[i].node.isFinal == true {
var w:[character] = []
// counter tells us which nodes characters are to be included in the sequence
// if counter > 0 then the node's character belongs to the sequence and we decrement the counter
// otherwise, the node was used in other sequences and is now "used up"
print("building seq starting with:",characters)
for j in 0...i{
if discoveredNodes[j].counter > 0 {
print(discoveredNodes[j].char)
w.append(discoveredNodes[j].char)
discoveredNodes[j].counter -= 1
}
}
w = characters + w
res.append(U(characters: w))
}
}
return res
}
I left 2 prints there to make clear at runtime how the algorithm works.
Test Run:
print (myTrie.suggestWords("ro"))
will print:
building seq starting with: ["r", "o"]
u
t
e
building seq starting with: ["r", "o"]
u
n
d
building seq starting with: ["r", "o"]
l
l
["route", "round", "roll"]
That’s it
Remark: The algorithm does not return the prefix itself if it’s a valid word in our dictionary
Full code here.
Now, that we reached the ending, I hope the article made you see the importance and usefulness of the Trie data structure .
Practice, practice, practice