Machine Learning and Cypher Puzzles (Part 1)
Today I bought Broken Sword for iPad. It’s not a bad game but that’s not the subject of this post.
One of the puzzles in the game is a classic cypher puzzle, in which you must map English letters to symbols in order to work decipher the text. I managed to brute-force the first puzzle by guessing some short words, like single-letters must be “I” or “A”, and pairs must be a consonant-vowel pair of some kind.

The second puzzle stumped me somewhat, and got me thinking how I would solve it programmatically. I’m sure there is a host of research out there exactly tailored to this kind of problem, but I’m keen to try to solve it myself and learn more about attacking this kind of problem.
At first I calculated the relative frequencies of the letters, and blindly mapped them to the most frequent letters in English. This produced complete gibberish:
fiolrotd faette paiied smtdete smhr wobe uniinced rtoai utns…
The problem is that the text sample is much to small to produce accurate frequency distributions for the letters. If the text is about aardvarks then the mapping would be completely off.
When looking for patterns by eye, another method I found was to look for double letters. Wikipedia says that the most common repeated letters in English are “LL, EE, SS, OO, TT etc.” While this looked kind of promising for checking by hand I haven’t implemented something that uses it yet.
I’m keen to use a more statistical approach to the solution. My first idea was to simply try all possibilities, and rate order the alignments based on which produce more words that occur in an English dictionary. With 26 letters, the number of possibilities is 26 factorial, 4.03291461 × 10^26 which I think is a big enough number to take more than an afternoon of fun coding.
To reduce those possibilities we need to make intelligent guesses about the alignments. Maybe start off with that failed alignment of most frequent letters that I had before.
But that leads to the question of how to rate alignments relative to each other. It would be better to be able to find when there are more words that are close, rather than a binary yes/no as to whether a given word is in the dictionary. For example an algorithm that prefers a model that produces “futter” to “ztkkld”.
It’s possible to calculate how similar two words are based on their Levenshtein distance, but in order to give a single word a score we’d have to compare it to every one of the millions of word in the dictionary. We’d need to repeat that for all 50+ words in the sample sentence.
Assuming this is possible to do in a reasonable amount of time, we would want to know which alignments are producing more results than others. So going back to the previous example, the “tter” in “futter” is pretty good but the “fu” at the start is letting it down. Put simply, we need to know which bits to keep and which bits to change.
This is similar to the IBM Model 1 translation model that I have been studying recently. It features an alignment model that we try to improve for mapping English words to, for example, French words. This problem is similar but instead of mapping words to words we are trying to map letters to letters.
IBM Model 1 is a supervised algorithm in that you already have a pairs of English and French sentences that are aligned. You know that the model should produce the given French sentence and iteratively improve your model to better produce the training examples. However in this case we are unsure of the output, and can only try to evaluate possibilities based on some other heuristics.
That’s all for the braindump, I’m tempted to try the brute force ideas I had above just to see how slow they are and how much of a speed improvement I can get by tossing out alignments at the first word that’s not in the dictionary.
I could also try to see how far I get with my letter frequency approach, and use the information for most likely pairs of letters (e.g ‘er’, ‘th’, ‘gh’), and ignore impossible/unlikely combinations like ‘zk’.
I’ll post more as I try it!
Playing with NLTK
I was playing with the Python NLTK again for the first time in a while and made myself the task of finding out all the different ways “to go” is used in English. So I was aiming for a list including stuff like “to go up”, “to go down” etc.
Apologies if the Python code is horrible, I’m still getting used to how to do things in it. Consequently I’m thinking in a mix of Perl/Ruby and still reading the manual a lot.
Code:
import nltk
from nltk.corpus import brown
from nltk.stem.regexp import *
porter = nltk.PorterStemmer()
import re
p = re.compile('\w+')
words = []
def process(sentence):
for (w1, t1), (w2, t2) in nltk.bigrams(sentence):
stem1 = porter.stem(w1)
if (stem1 == 'go' and t1.startswith('V') and p.match(t2)):
words.append( 'to ' + stem1 + ' ' + w2 + ' (' + t1 + ', ' + t2 + ')' )
for tagged_sent in brown.tagged_sents():
process(tagged_sent)
f = open('output.txt', 'w')
fdist = nltk.FreqDist(words)
for key, val in fdist.items():
f.write( "%d : %s\n" % ( val, key ) )
The results are:
202 : to go to (VBG, TO) 94 : to go to (VB, IN) 34 : to go to (VBG, IN) 29 : to go on (VB, RP) 27 : to go on (VBG, RP) 25 : to go out (VB, RP) 24 : to go back (VB, RB) 18 : to go home (VB, NR) 18 : to go into (VB, IN) 18 : to go up (VB, RP) 18 : to go with (VB, IN) 10 : to go down (VB, RP) 9 : to go back (VBG, RB) 8 : to go into (VBG, IN) 8 : to go through (VB, IN) 7 : to go a (VB, AT) 7 : to go and (VB, CC) 7 : to go for (VB, IN) 7 : to go out (VBG, RP) 7 : to go over (VB, RP) 7 : to go to (VB, TO) ...
I could do more to trim down the results, like merging all the different types of verb part-of-speech tags (VB, VBG).
Any comments, I’d be happy to hear them.