Random Names using Markov Chains

Introduction

In this tutorial we’ll look at generating plausible random names from a list of pre-existing names. We’ll also build a fun little app that lets you generate your Elf / Ninja name.

final app

Just generating random characters won’t give us reasonable results. We’ll use a list of preexisting names in order to generate new ones. We’ll take advantage of the distribution of letters in our preexisting names.

For each name we look at each letter from that name.
For each letter we keep track of every letter that directly follows it.
Now we have a list of mappings from letters to all possible letters that directly follow them.

To generate a new name we use the following approach:
1. We start with some letter l1. Our new name consists solely of l1.
2. Starting from the letter (l1) we randomly select a letter that follows it (l2).
3. We add the letter l2 to our name
4. We repeat the the above algorithm starting at Step 2 with the letter l2.

Once we reach a predetermined maximum number of letters or we reach a letter that has no successor we stop.

Lets say we have a list of names: “Mark, Michael and Mathew”. For each letter let’s look at the letters that follow it:

m : [a, i, a]
a : [r, e, t]
r : [k]
k : []
i : [c]
c : [h]
h : [a, e]
e : [l,w]
l : []
t : [h]
w : []

Now starting from the letter m lets generate a new name.

  1. for m we pick the successor a
  2. for a we pick the successor e
  3. for e we pick the successor l
  4. l has no possible successor so we stop there

We end up generating the name Mael, some other possibilities starting from M would be: Maew, Michew, Michark

Some possible outputs when starting from h would be Hel, Hark, Hael, Hathathel and others.

Note: in practice the list of words that we use to generate new words (the seed) would be much larger than 3, typically at least 100 entries.

This process is knowed as a Markov Chain i.e. a process that has some states between which it transitions. Transitions occur randomly and a transition only depends on the current state, it doesn’t need any info on past states.

A Swift Implementation

NOTE: The implementation I’m going to show below is sub-optimal. i.e. It will be slow and take up a lot of unnecessary memory for long lists of names. I’m trying to emphasize clarity and simplicity instead of speed.

We’ll implement all our logic in a class MarkovGenerator. The most important data that we’ll keep is a list of words ( the words we use to generate new words) and transitionTable a dictionary that maps from letters to successors of letters. For each letter that is a key in the dictionary the corresponding value is an array of strings that contains the letters successors, the array may contain duplicates.

class MarkovGenerator {
    private let words: [String]
    private var transitionTable: [String : [String]] = [:]
}

We’ll create 2 initializers. One that takes a list of words and a convenience initializer that reads a list of words from a file. Our init methods will call the buildTransitionTable function that will take care of establishing the successors for each letter in each word.

class MarkovGenerator {
...
    init(words: [String]) {
        self.words = words

        buildTransitionTable()
    }

    convenience init(fileName: String) {
        let path = NSBundle.mainBundle().pathForResource(fileName, ofType: "txt")!

        let names = String(contentsOfFile: path, encoding: NSUTF8StringEncoding, error: nil)!.componentsSeparatedByString("\n").map { $0.lowercaseString}

        self.init(words: names)
    }
}

In the buildTransitionTable function we go through all our words and call the addWordToTransitionTable on each word.

In the addWordToTransitionTable function we go through all the characters in our word. For each character we consider it’s successor (nextChar), in case we reach the last letter we set it’s successor to a special character("$").

We use the current character as a key into the transition table and add it’s successor to the array stored in the dictionary, updating or creating the array as needed.

class MarkovGenerator {
...
    private func buildTransitionTable() {
        for word in words {
            addWordToTransitionTable(word)
        }

    }

    private func addWordToTransitionTable(word: String) {

        let wordArray = Array(word)

        for index in 0..<wordArray.count {

            let char = String(wordArray[index])
            var nextChar = "$"
            if index < wordArray.count - 1 {
                nextChar = String(wordArray[index + 1])
            }

            var transitionsArray = transitionTable[char]
            if (transitionsArray == nil) {
                transitionsArray = []
            }

            transitionsArray?.append(nextChar)

            transitionTable[char] = transitionsArray

        }
    }
...
}

Generating Words

First we’ll need to add some helper functions that we’ll use to generate words.

The generateNextCharacter function takes a character and generates a new character by randomly selecting an entry from the array in the transition table corresponding to the input character.

class MarkovGenerator {
...
    private func generateNextCharacter(character: String) -> String {
        let transitionArray = transitionTable[character]!

        let p = Int(arc4random()) % transitionArray.count

        return transitionArray[p]
    }
...
}

We’ll also need a function that generates a random character that appears in some of our input words. This is equivalent to selecting a random key of the transition table.

class MarkovGenerator {
...
    private func randomCharacter() -> String {
        return transitionTable.keys.array[Int(arc4random()) % transitionTable.keys.array.count]
    }
...
}

We’ll want to somehow constrain the length of the generated words, to do this we’ll add 2 more properitesminWordLength and maxWordLength.

class MarkovGenerator {
...
    var minWordLength = 4
    var maxWordLength = 10
...
}

Now finally we’re ready to add a method that generates a word. We’ll add a method generateWord(startString: String = "") -> String, the optional parameter startString indicates some string that should be used to start generating the new string.

We’ll keep track of the generated word in a variable result. We’ll also use a variable currentChar that keeps track of the current character we’re processing. If the provided startingString is empty we initialize currentChar to a random character, otherwise we initialize it to the last character in the startingString.

Next we repeat the following process a maximum of maxWordLength times:
We transition to the next character using the generateNextCharacter function.
If the generated character denotes the end of a word("$") and we have generated at least minWordLength characters we return the generated string.
If the generated character denotes the end of a word("$") and we have not generated enough character we randomly generate a new character until it’s not the end of a word.
We append the generated character to our result.

Whenever the result is returned we’ll want to capitalize it. Names should start with an uppercase letter after all!

class MarkovGenerator {
...
    func generateWord(startString: String = "") -> String {

        var startStringArray = Array(startString)

        var currentChar = randomCharacter()
        var result = currentChar

        if startStringArray.count > 0 {
            result = startString
            currentChar = String(startStringArray.last!)
        }

        for i in 1...maxWordLength {
            currentChar = generateNextCharacter(currentChar)

            if currentChar == "$" {
                if i > minWordLength {
                    return result.capitalizedString
                } else {
                    currentChar = randomCharacter()
                }
            }

            result += currentChar
        }

        return result.capitalizedString
    }
...
}

Example Usage of the class:

let markovGenerator = MarkovGenerator(words: ["Mark","Michael","Mathew"])

markovGenerator.generateWord() // Maew

Now let’s use the generator to build an iOS app!

Setting up the iOS app

You can download a starter project that includes the MarkovGenerator class and 2 files with lists of names (“ElfNames.txt” and “NinjaNames.txt”).

The interface for the starter project is empty. We’ll add the following components that will make up the app:
A label with text “Generate your elf / ninja name” A segmented control with 2 segments “Elf” and “Ninja” A labelwith text “Name:” A textfield to the right of the name label A button with title “Generate” An empty label where the result will be displayed. Make sure to set the label‘s Alignment to Center and Lines to 2

The view should look like this after you have everything set up.

UI

Connecting Outlets

We’ll have to connect 2 outlets:
The textfield as nameTextField
The bottom label as resultLabel

Also we’ll connect 2 actions:
The Value Changed action for our segmented control as segmentedControlChanged.
The Touch Up Inside action for our button as didTapGenerateButton.

Your ViewController should look like this after you’ve set everything up:

class ViewController: UIViewController {

    @IBOutlet weak var resultLabel: UILabel!
    @IBOutlet weak var nameTextField: UITextField!

    @IBAction func segmentedControlChanged(sender: UISegmentedControl) {
    }

    @IBAction func didTapGenerateButton(sender: AnyObject) {
    }

...
}

Implementing the Logic

We’ll keep 2 separate instances of MarkovGenerator, one for Elf names and one for Ninja names. We’ll also keep a 3rd property that will keep the currently active MarkovGenerator.

class ViewController {
...
    var markovGenerator: MarkovGenerator!
    var elfGenerator: MarkovGenerator!
    var ninjaGenerator: MarkovGenerator!
}

In the viewDidLoad method we’ll intialize our MarkovGenerators.

class ViewController {
...
    override func viewDidLoad() {
        super.viewDidLoad()

        self.elfGenerator = MarkovGenerator(file: "ElfNames")

        self.ninjaGenerator = MarkovGenerator(file: "NinjaNames")

        self.markovGenerator = self.elfGenerator
    }
...
}

When the segmented control changes value we’ll want to update the active markov generator that we use:

class ViewController {
...
    @IBAction func segmentedControlChanged(sender: UISegmentedControl) {
        if sender.selectedSegmentIndex == 0 {
            self.markovGenerator = self.elfGenerator
        } else {
            self.markovGenerator = self.ninjaGenerator
        }
    }
...
}

When the button is tapped we’ll want to generate a new text and place it in our resultLabel. We’ll also want to add some flavor text based on the current active generator. (“Your Elf name is:” or “Your Ninja name is”).

Now, we don’t want to pass the full name the user inputted to the markov generator, let’s just pass the first 2 letters so that the generated word starts like the inputted name. We’ll use an String extension to get the first 2 characters of the name.

extension String {
    subscript (r: Range<Int>) -> String {
        get {
            let startIndex = advance(self.startIndex, r.startIndex)
            let endIndex = advance(startIndex, r.endIndex - r.startIndex)

            return self[Range(start: startIndex, end: endIndex)]
        }
    }
}
class ViewController {
...
    @IBAction func didTapGenerateButton(sender: AnyObject) {

        let generatedType = self.markovGenerator === self.elfGenerator ? "Elf" : "Ninja"

        var name = ""

        let nameLength = count(self.nameTextField.text)
        if nameLength > 0 {
            name = self.nameTextField.text[0..<min(2,nameLength)].lowercaseString
        }

        let generatedName = self.markovGenerator.generateWord(startString: name)

        self.resultLabel.text = "Your \(generatedType) name is:\n\(generatedName)"
    }
...
}

And… we’re done :)

Download Source Code

Extensions

  1. We have only considered individual letters for our markov chains, a useful extension is to consider groups of nletters (pairs, triplets, …). So for some group of n letters we know all the letters that follow after that group. This tends to give results that are closer to the original text.
  2. We can use a very simillar approach to generate random text, given some source text. Instead of considering individual letters, we consider words and all the words that can succeed them.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Subscribe
We send about one email per week with our latest tutorials and updates
Never display this again :)