I would like to generate a random text using letter frequencies from a book in a .txt file, so that each new character (string.lowercase + ' ') depends on the previous one.
How do I use Markov chains to do so? Or is it simpler to use 27 arrays with conditional frequencies for each letter?
I would like to generate a random text using letter frequencies from a book in a txt file
Consider using collections.Counter to build-up the frequencies when looping over the text file two letters at a time.
How do I use markov chains to do so? Or is it simpler to use 27 arrays with conditional frequencies for each letter?
The two statements are equivalent. The Markov chain is what you're doing. The 27 arrays with conditional frequencies is how you're doing it.
Here is some dictionary based code to get you started:
from collections import defaultdict, Counter
from itertools import ifilter
from random import choice, randrange
def pairwise(iterable):
    it = iter(iterable)
    last = next(it)
    for curr in it:
        yield last, curr
        last = curr
valid = set('abcdefghijklmnopqrstuvwxyz ')
def valid_pair((last, curr)):
    return last in valid and curr in valid
def make_markov(text):
    markov = defaultdict(Counter)
    lowercased = (c.lower() for c in text)
    for p, q in ifilter(valid_pair, pairwise(lowercased)):
        markov[p][q] += 1
    return markov
def genrandom(model, n):
    curr = choice(list(model))
    for i in xrange(n):
        yield curr
        if curr not in model:   # handle case where there is no known successor
            curr = choice(list(model))
        d = model[curr]
        target = randrange(sum(d.values()))
        cumulative = 0
        for curr, cnt in d.items():
            cumulative += cnt
            if cumulative > target:
                break
model = make_markov('The qui_.ck brown fox')
print ''.join(genrandom(model, 20))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With