The Beale Papers Paradox…

Posted by nickpelling on Jun 18th, 2010

It seems as though penetrating public cryptographic analysis of the three Beale Papers (B1, B2, and B3) halted abruptly in 1980 when Jim Gillogly pointed out a problem with B1. If, as he pointed out, you apply to B1 the same dictionary code used for B2 (famously derived from the Declaration of Independence), you get a ciphertext with some distinctive properties:- 

SCS?E TFA?G CDOTT UCWOT WTAAI WDBII DTT?W TTAAB BPLAA ABWCT
LTFIF LKILP EAABP WCHOT OAPPP MORAL ANHAA BBCCA CDDEA OSDSF
HNTFT ATPOC ACBCD DLBER IFEBT HIFOE HUUBT TTTTI HPAOA ASATA
ATTOM TAPOA AAROM PJDRA ??TSB COBDA AACPN RBABF DEFGH IIJKL
MMNOH PPAWT ACMOB LSOES SOAVI SPFTA OTBTF THFOA OGHWT ENALC
AASAA TTARD SLTAW GFESA UWAOL TTAHH TTASO TTEAF AASCS TAIFR
CABTO TLHHD TNHWT STEAI EOAAS TWTTS OITSS TAAOP IWCPC WSOTT
IOIES ITTDA TTPIU FSFRF ABPTC COAIT NATTO STSTF ??ATD ATWTA
TTOCW TOMPA TSOTE CATTO TBSOG CWCDR OLITI BHPWA AE?BT STAFA
EWCI? CBOWL TPOAC TEWTA FOAIT HTTTT OSHRI STEOO ECUSC ?RAIH
RLWST RASNI TPCBF AEFTB

Here you can see not only tripled letters (AAA, PPP), quadrupled letters (TTTT) and even quintupled letters (TTTTT), but also (and this is the part that ignited Gillogly’s cryptographic curiosity) the sequence ABFDEFGHIIJKLMMNOHPP. Even if you restrict your view to the DEFGH IIJKL MMNO monotonically increasing sub-sequence in the middle, the chances of that appearing at random would be (he calculates) about one in a million million. Making it even more improbable is the fact that the aberrant “F” near the start has code 195 where code 194 is “C”, and the aberrant “H” near the end has code 301 where code 302 is “O”, which makes it look a great deal as though these were simply encoding slips. And if these were intended to be C and O respectively, the unlikeliness of the sequence vastly increases again. 

Yet as far as the multiple letter groups go, we can do some simple probability calculations based on the 1321 characters Gillogly lists for the B2 codebook. From frequency analysis – T 255, A 167, O 145, H 80, I 69, S 62, F 62, P 59, W 59, C 53, B 48, R 41, D 37, E 36, L 35, M 30, U 28, G 19, N19, J 10, K 4, V 2, Y 1, X 1, Q 1, Z 0 – you can see that T, A, and P occur 19.3%, 13.5%, and 4.46% (respectively) of the time in the codebook. So, if the text letters were picked at random (as would pretty much be the case if B2′s codebook was completely the wrong codebook for B1), the chances of these patterns occurring randomly at least once in a 520-character sample would be something like this:- 

  • prob(TTTTT) = 1 – (1 - 0.193^5)^(520-(5-1)) = 12.9%
  • prob(TTTT) = 1 – (1 - 0.193^4)^(520-(4-1)) = 51.2%
  • prob(AAA) = 1 – (1 – 0.135^3)^(520-(3-1)) = 72.1%
  • prob(PPP) = 1 – (1 – 0.0446^3)^(520-(3-1)) = 4.5%

You would also expect to see a copious amount of TT and AA pairs scattered through the text, which is in fact exactly what we see (13 x TT and 10 x AA, quite apart from the TTTTT, TTTT and AAA listed above). 

And therein lies the basic Beale Papers paradox: though the distribution and clustering seem to imply that B2′s codebook was not B1′s codebook, the ‘Gillogly sequence’ seems to imply that the two are linked in some way. So, what’s it to be? Damoclean swords aside, how can we unpick this cryptologic knot? 

My observation here is that if there is also some kind of monoalphabetic substitution going on (i.e. in addition to the Declaration of Independence codebook), then it’s quite possible that the Gillogly sequence represents the keyword or keystring used to generate that substitution alphabet. This might well explain the doubled letters within the keystring (i.e. the II MM and PP): if so, we would be looking for a keystring with four doubled letters but where none of the vowels repeat. 

ABCDEFGHIIJKLMMNOOPP 

Hmmm… there can’t be many English words ending with two adjacent doubled letters: in fact, the only two I can think of are coffee and toffee (please let me know if you can think of any others!) ‘Toffee’ doesn’t sound very promising, so could it be ‘coffee’? The previous word would then need to end with “C” to make a doubled letter… not hugely promising, but perhaps it’s a start!- 

ABCDEFGHIIJKLM MNOOPP
xxxxxxxxxxxxxC COFFEE
xxxxxxxxxxxxxT TOFFEE

Alternatively, it might be a three letter word, like “TOO” or “OFF”. Had Eric Sams considered this, doubtless he would have happily constructed all kinds of valid key phrases that fit these constraints, such as:-

ABCDEF GHIIJ KLMMNO OPP
CLUNKY SPEED RABBIT TOO

OK, it’s true that the key phrase to the Beale Papers is not going to be “CLUNKY SPEED RABBIT TOO”, but maybe (just maybe) it’s a step in the right direction. icon smile The Beale Papers Paradox...  

Incidentally… the Wikipedia Beale Papers page notes that “In 1940, the famous cryptanalyst, Dr. White of Yale University, came close to solving the Beale ciphers after tracking down the suspected key hidden by Beale in St. Louis—he never spoke of his findings.” Though I did a bit of Internet sleuthing to try to work out who this Dr White was, I didn’t really get anywhere – I don’t think he was the Maurice Seal White (b.1888) who wrote the 1938 book “Secret writing : how to write and solve messages in cipher and code” (which I found listed in Lou Kruh’s bibliography and Worldcat) and who was a Columbia alumnus in 1920 (see p.212 here), but it’s hard to tell. Please let me know if you find out!

17 Responses

  1. Dennis Says:

    Very interesting, Nick! You ought to discuss this with Jim Gillogly. I tried to find contact info for him a while back but didn’t succeed. We can discuss privately.
    Cheers,
    Dennis

  2. nickpelling Says:

    Dennis: I’m still working out what I think this means, as there’s quite a tortuous logical chain of cryptological reasoning involved. But it’s an interesting starting point, for certain. :-)

  3. Elmar Says:

    Then of course the terminal “P” of the keystring could really not be the last keystring letter, but the first letter of the subsequent ciphertext, rendering any toffee machinations obsolete…

  4. nickpelling Says:

    Elmar: all the same, a toffee-based decipherment puzzle would definitely be something to chew over. ;-)

  5. nickpelling Says:

    Two more -xxyy pattern words I just noticed: committee and Tennessee. My current tentative prediction is that this will turn out to be someone’s name…

  6. James Pannozzi Says:

    I’m a little curious if anyone has pursued the word-number table approach, popular in the pre-Victorian era and Victorian era.
    Assuming an alphabetically organized table, some long term computer program could look for patterns that way (cf. “A Tale of a Cypher and APL”).
    At one time this would have been impossible but with todays computers and memory it’s not. Even brute force search – the fact of alphabetic organization implies a pattern will be found in the numbers. Backtracking then gives a partial build of the table. Going forward with the hypothetical word-number assignments then attempts to decode the puzzle. A dictionary from about 1810 or 20 would be a good start, though it should be a popular dictionary.
    Quantum computers, should they be built, would do it even faster.

  7. nickpelling Says:

    James: there are lots of things you could try to solve the Beale Ciphers, but the point of the post – the Beale Papers Paradox – is that despite Gillogly’s sequences in B1, it still seems more likely that broadly the same text was enciphered in broadly the same kind of way for both B1 and B2. My #1 recommendation to would-be Beale solvers would therefore be to engage with (and overcome) this paradox with the tools that you already have. I don’t honestly believe you would need to start trawling through every pre-1825 book ever printed to solve this – but your time is your own, so feel free to follow whatever leads you like. :-)

  8. Cat Says:

    Nick, Is the Beale Cipher still used as required reading for Signals Intelligence type classes? And other than possibly digging up someone’s back yard because they may have put a nice, comfy garage/dog house/swimming pool over it, has anyone simply posted a reward for a definate solution?

  9. Jim Says:

    there can’t be many English words ending with two adjacent doubled letters

    Are you sure?

    $ egrep '(.)\1(.)\2$' /usr/share/dict/words
    Karroo
    Tallahassee
    Tennessee
    addressee
    coffee
    committee
    fricassee
    lessee
    settee
    subcommittee
    tattoo
    toffee
    yippee

  10. Brian Says:

    Has anyone ever attempted the frequency method? Could work, am in the process of doing so; on a side note, I took into observation that the first page contains numbering which outnumbers the Declaration, however, the Constitution, which is another widely public document(which is just what a coder would want for the intended person to solve). The Constitution fits the number of words available, but so does the Farewell document. Although, my theories do revolve around historical documents, it could also be that the location was only to be deciphered using something only he and his best friend, Robert Morrison, would know.

  11. Diane Says:

    Could a similar method underlying the doubling and tripling seen in the Voynich ms., does anyone think?

    Non-cipher person’s question here.

  12. nickpelling Says:

    Diane: I’m pretty sure that the Beale Papers and the Voynich Manuscript use very different kinds of cryptography: there are far more differences than similarities.

  13. Stephen M. Matyas, Jr. Says:

    On my website http://www.BealeTreasureStory.com, you will find a discussion of the letter strings in the decipherment of B1 using the key to B2. Arguments are given that the letter strings are a stronger indication that the treasure story is true than false.

  14. Stephen M. Matyas, Jr. Says:

    Why don’t you use a simulation rather than doing probability calculations? Mix the cipher 10,000 times. Make a count of letters strings at each iteration, plot these, and then compare this distribution to what you find in B1 decoded with key to B2.

    I don’t see this as a paradox. It is simply information that one doesn’t understand until it is understood.

  15. nickpelling Says:

    Stephen: for me, the Gillogly strings in B1 have always indicated the presence of cryptographic signal rather than noise, so I haven’t really needed any persuading. :-)

  16. Stephen M. Matyas, Jr. Says:

    Nick:
    I don’t think the stings are a cryptographic signal. I accept the story at face value: Beale created three ciphers and Mr. Morriss was supposed to decipher them using the ciphers in a straightforward way. Beale wasn’t expecting Morriss to decode #1 with the key to #2 in order to learn something needed to decipher something else. That is nonsense. Beale put the strings there for a purpose, but it was for his purpose not Morriss’ purpose.
    It’s not a cryptographic signal.

  17. nickpelling Says:

    Stephen: Jim Gillogly came to much the same conclusion as you, but I have to say I disagree with both of you – to my eyes, the sequence is just too implausible to be either statistically coincidental or planned. But that’s ok – opinions are meant to differ! :-)

Leave a Comment




XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Please note: Comment moderation is enabled and may delay your comment. There is no need to resubmit your comment.