The Beale Papers Paradox…
It seems as though penetrating public cryptographic analysis of the three Beale Papers (B1, B2, and B3) halted abruptly in 1980 when Jim Gillogly pointed out a problem with B1. If, as he pointed out, you apply to B1 the same dictionary code used for B2 (famously derived from the Declaration of Independence), you get a ciphertext with some distinctive properties:-
SCS?E TFA?G CDOTT UCWOT WTAAI WDBII DTT?W TTAAB BPLAA ABWCT LTFIF LKILP EAABP WCHOT OAPPP MORAL ANHAA BBCCA CDDEA OSDSF HNTFT ATPOC ACBCD DLBER IFEBT HIFOE HUUBT TTTTI HPAOA ASATA ATTOM TAPOA AAROM PJDRA ??TSB COBDA AACPN RBABF DEFGH IIJKL MMNOH PPAWT ACMOB LSOES SOAVI SPFTA OTBTF THFOA OGHWT ENALC AASAA TTARD SLTAW GFESA UWAOL TTAHH TTASO TTEAF AASCS TAIFR CABTO TLHHD TNHWT STEAI EOAAS TWTTS OITSS TAAOP IWCPC WSOTT IOIES ITTDA TTPIU FSFRF ABPTC COAIT NATTO STSTF ??ATD ATWTA TTOCW TOMPA TSOTE CATTO TBSOG CWCDR OLITI BHPWA AE?BT STAFA EWCI? CBOWL TPOAC TEWTA FOAIT HTTTT OSHRI STEOO ECUSC ?RAIH RLWST RASNI TPCBF AEFTB
Here you can see not only tripled letters (AAA, PPP), quadrupled letters (TTTT) and even quintupled letters (TTTTT), but also (and this is the part that ignited Gillogly’s cryptographic curiosity) the sequence ABFDEFGHIIJKLMMNOHPP. Even if you restrict your view to the DEFGH IIJKL MMNO monotonically increasing sub-sequence in the middle, the chances of that appearing at random would be (he calculates) about one in a million million. Making it even more improbable is the fact that the aberrant “F” near the start has code 195 where code 194 is “C”, and the aberrant “H” near the end has code 301 where code 302 is “O”, which makes it look a great deal as though these were simply encoding slips. And if these were intended to be C and O respectively, the unlikeliness of the sequence vastly increases again.
Yet as far as the multiple letter groups go, we can do some simple probability calculations based on the 1321 characters Gillogly lists for the B2 codebook. From frequency analysis – T 255, A 167, O 145, H 80, I 69, S 62, F 62, P 59, W 59, C 53, B 48, R 41, D 37, E 36, L 35, M 30, U 28, G 19, N19, J 10, K 4, V 2, Y 1, X 1, Q 1, Z 0 – you can see that T, A, and P occur 19.3%, 13.5%, and 4.46% (respectively) of the time in the codebook. So, if the text letters were picked at random (as would pretty much be the case if B2′s codebook was completely the wrong codebook for B1), the chances of these patterns occurring randomly at least once in a 520-character sample would be something like this:-
- prob(TTTTT) = 1 – (1 - 0.193^5)^(520-(5-1)) = 12.9%
- prob(TTTT) = 1 – (1 - 0.193^4)^(520-(4-1)) = 51.2%
- prob(AAA) = 1 – (1 – 0.135^3)^(520-(3-1)) = 72.1%
- prob(PPP) = 1 – (1 – 0.0446^3)^(520-(3-1)) = 4.5%
You would also expect to see a copious amount of TT and AA pairs scattered through the text, which is in fact exactly what we see (13 x TT and 10 x AA, quite apart from the TTTTT, TTTT and AAA listed above).
And therein lies the basic Beale Papers paradox: though the distribution and clustering seem to imply that B2′s codebook was not B1′s codebook, the ‘Gillogly sequence’ seems to imply that the two are linked in some way. So, what’s it to be? Damoclean swords aside, how can we unpick this cryptologic knot?
My observation here is that if there is also some kind of monoalphabetic substitution going on (i.e. in addition to the Declaration of Independence codebook), then it’s quite possible that the Gillogly sequence represents the keyword or keystring used to generate that substitution alphabet. This might well explain the doubled letters within the keystring (i.e. the II MM and PP): if so, we would be looking for a keystring with four doubled letters but where none of the vowels repeat.
ABCDEFGHIIJKLMMNOOPP
Hmmm… there can’t be many English words ending with two adjacent doubled letters: in fact, the only two I can think of are coffee and toffee (please let me know if you can think of any others!) ‘Toffee’ doesn’t sound very promising, so could it be ‘coffee’? The previous word would then need to end with “C” to make a doubled letter… not hugely promising, but perhaps it’s a start!-
ABCDEFGHIIJKLM MNOOPP xxxxxxxxxxxxxC COFFEE xxxxxxxxxxxxxT TOFFEE
Alternatively, it might be a three letter word, like “TOO” or “OFF”. Had Eric Sams considered this, doubtless he would have happily constructed all kinds of valid key phrases that fit these constraints, such as:-
ABCDEF GHIIJ KLMMNO OPP CLUNKY SPEED RABBIT TOO
OK, it’s true that the key phrase to the Beale Papers is not going to be “CLUNKY SPEED RABBIT TOO”, but maybe (just maybe) it’s a step in the right direction.
Incidentally… the Wikipedia Beale Papers page notes that “In 1940, the famous cryptanalyst, Dr. White of Yale University, came close to solving the Beale ciphers after tracking down the suspected key hidden by Beale in St. Louis—he never spoke of his findings.” Though I did a bit of Internet sleuthing to try to work out who this Dr White was, I didn’t really get anywhere – I don’t think he was the Maurice Seal White (b.1888) who wrote the 1938 book “Secret writing : how to write and solve messages in cipher and code” (which I found listed in Lou Kruh’s bibliography and Worldcat) and who was a Columbia alumnus in 1920 (see p.212 here), but it’s hard to tell. Please let me know if you find out!

June 19th, 2010 at 6:19 pm
Very interesting, Nick! You ought to discuss this with Jim Gillogly. I tried to find contact info for him a while back but didn’t succeed. We can discuss privately.
Cheers,
Dennis
June 19th, 2010 at 6:35 pm
Dennis: I’m still working out what I think this means, as there’s quite a tortuous logical chain of cryptological reasoning involved. But it’s an interesting starting point, for certain.
June 21st, 2010 at 6:58 am
Then of course the terminal “P” of the keystring could really not be the last keystring letter, but the first letter of the subsequent ciphertext, rendering any toffee machinations obsolete…
June 21st, 2010 at 8:22 am
Elmar: all the same, a toffee-based decipherment puzzle would definitely be something to chew over.
June 29th, 2010 at 3:41 pm
Two more -xxyy pattern words I just noticed: committee and Tennessee. My current tentative prediction is that this will turn out to be someone’s name…
July 6th, 2010 at 5:52 pm
I’m a little curious if anyone has pursued the word-number table approach, popular in the pre-Victorian era and Victorian era.
Assuming an alphabetically organized table, some long term computer program could look for patterns that way (cf. “A Tale of a Cypher and APL”).
At one time this would have been impossible but with todays computers and memory it’s not. Even brute force search – the fact of alphabetic organization implies a pattern will be found in the numbers. Backtracking then gives a partial build of the table. Going forward with the hypothetical word-number assignments then attempts to decode the puzzle. A dictionary from about 1810 or 20 would be a good start, though it should be a popular dictionary.
Quantum computers, should they be built, would do it even faster.
July 6th, 2010 at 6:40 pm
James: there are lots of things you could try to solve the Beale Ciphers, but the point of the post – the Beale Papers Paradox – is that despite Gillogly’s sequences in B1, it still seems more likely that broadly the same text was enciphered in broadly the same kind of way for both B1 and B2. My #1 recommendation to would-be Beale solvers would therefore be to engage with (and overcome) this paradox with the tools that you already have. I don’t honestly believe you would need to start trawling through every pre-1825 book ever printed to solve this – but your time is your own, so feel free to follow whatever leads you like.
September 17th, 2010 at 5:21 pm
Nick, Is the Beale Cipher still used as required reading for Signals Intelligence type classes? And other than possibly digging up someone’s back yard because they may have put a nice, comfy garage/dog house/swimming pool over it, has anyone simply posted a reward for a definate solution?
February 1st, 2011 at 7:12 pm
Are you sure?
$ egrep '(.)\1(.)\2$' /usr/share/dict/words
Karroo
Tallahassee
Tennessee
addressee
coffee
committee
fricassee
lessee
settee
subcommittee
tattoo
toffee
yippee
October 15th, 2011 at 6:22 am
Has anyone ever attempted the frequency method? Could work, am in the process of doing so; on a side note, I took into observation that the first page contains numbering which outnumbers the Declaration, however, the Constitution, which is another widely public document(which is just what a coder would want for the intended person to solve). The Constitution fits the number of words available, but so does the Farewell document. Although, my theories do revolve around historical documents, it could also be that the location was only to be deciphered using something only he and his best friend, Robert Morrison, would know.
October 21st, 2011 at 9:46 am
Could a similar method underlying the doubling and tripling seen in the Voynich ms., does anyone think?
Non-cipher person’s question here.
October 21st, 2011 at 10:26 am
Diane: I’m pretty sure that the Beale Papers and the Voynich Manuscript use very different kinds of cryptography: there are far more differences than similarities.
January 28th, 2012 at 9:18 pm
On my website http://www.BealeTreasureStory.com, you will find a discussion of the letter strings in the decipherment of B1 using the key to B2. Arguments are given that the letter strings are a stronger indication that the treasure story is true than false.
January 28th, 2012 at 9:28 pm
Why don’t you use a simulation rather than doing probability calculations? Mix the cipher 10,000 times. Make a count of letters strings at each iteration, plot these, and then compare this distribution to what you find in B1 decoded with key to B2.
I don’t see this as a paradox. It is simply information that one doesn’t understand until it is understood.
January 28th, 2012 at 10:01 pm
Stephen: for me, the Gillogly strings in B1 have always indicated the presence of cryptographic signal rather than noise, so I haven’t really needed any persuading.
January 31st, 2012 at 7:55 pm
Nick:
I don’t think the stings are a cryptographic signal. I accept the story at face value: Beale created three ciphers and Mr. Morriss was supposed to decipher them using the ciphers in a straightforward way. Beale wasn’t expecting Morriss to decode #1 with the key to #2 in order to learn something needed to decipher something else. That is nonsense. Beale put the strings there for a purpose, but it was for his purpose not Morriss’ purpose.
It’s not a cryptographic signal.
February 2nd, 2012 at 1:53 pm
Stephen: Jim Gillogly came to much the same conclusion as you, but I have to say I disagree with both of you – to my eyes, the sequence is just too implausible to be either statistically coincidental or planned. But that’s ok – opinions are meant to differ!