The Chaocipher revealed!

The Chaocipher” is a devious cipher system invented in 1918 by John F. Byrne: allegedly, it was so complex that nobody could crack his challenge ciphertexts (even with the plaintext to refer to!), yet was so simple that its mechanism was claimed to comprise only two rotating disks small enough to fit in a cigar box, and could be operated by a ten-year-old (admittedly a diligent, determined and well-practised one) to encipher and decipher texts.

Hence, the Chaocipher’s long-standing mystery revolved around three questions:

  1. Was the Chaocipher for real? (i.e. could something so simple really produce such tricksy ciphertext)?
  2. Was it more secure than, say, the Enigma machine?
  3. More to the point, is the Chaocipher actually an unbreakable cipher?

As of a few years ago, only three people knew the Chaocipher’s secrets – John Byrne Jr (the inventor’s son), and two Cryptologia editors (who saw it in 1990 but were sworn to silence). Yet as Chaucer noted, time and tide wait for no man (not even Cryptologia editors) – so there was a very real (and growing) possibility that the secrets of the Chaocipher might somehow get lost forever.

Hence last August, Moshe Rubin – who CM readers may well recall as the zesty Israeli software / crypto guy who not long before had set up the Chaocipher Clearing House website – decided to try to contact John Jr before it was too late, and so cold-called his way through the list of Byrnes living in Vermont. Before long, Moshe found himself in contact with Patricia Byrne (John Jr’s wife) from whom he discovered the sad news that her husband had passed away a year or two previously.

However, because Pat Byrne was already looking for a buyer for her husband’s cryptological material, Moshe put her in contact (via David Kahn) with David D’Auria, the chairman of the National Cryptological Museum’s Acquisitions Commitee. Somewhat surprisingly, after a couple of months Pat Byrne very generously decided to donate the whole set to the NCM, a terrific gesture which I (for one) highly appreciate (and I hope that you do too!)

And so it came to be that Moshe Rubin found himself allowed what he describes as “preview access to some of the material“.  Although he found that the precise setup John Byrne Sr had employed was not immediately obvious from the material to hand, Moshe burnt a load of midnight oil (is elbow grease more or less inflammable?) before finally managing to reconstruct the original algorithm in all its subtly obfuscatory glory.

Just as Byrne had described, his Chaocipher used two rotors (with the plaintext alphabet on the right rotor and the ciphertext alphabet on the left rotor) BUT with both alphabets altered slightly (let’s call this process ‘twizzling’, for want of a better word) after processing each letter. I’ve hacked together a 30-second Chaocipher animation on YouTube to try to demonstrate Byrne’s twizzlification…

Rather than go through the fine details here, I’m happy to refer you to Moshe’s detailed (and very readable) description of the process here: the only significant difference between my video and his text is that because the rotors mesh (and hence physically rotate in opposite directions to each other), the numbering sequence on each rotor is reversed relative to the other – i.e. even though #1 is at the top of each rotor, #2 and #3 proceed clockwise on the right (plaintext) rotor but anticlockwise on the left (ciphertext) rotor. Whereas in his text, both numbering systems run in parallel to each other (which might confuse you, it certainly confused me a little).

Of course, the obvious practical weakness of the Chaocipher is that any errors in enciphering, transmission, and deciphering get near-irreversibly propagated through the rest of the message: which probably makes the whole system too fragile to use in wartime, however cryptographically secure it may be (and, answering the second question above, I suspect that it may well prove to be more complex than Enigma, for it really is quite a fiendish system).

But is it (practically) unbreakable? Well, the obvious answer would be that if it has now been released into the wild, you’d have thought someone in a three-letter-agency (or GCHQ, naturally) would have worked out a clever way in. However, I’m not 100% sure that has happened yet… so, interesting times.

All credit to Moshe Rubin, then, for his persistence and hard work bringing this cipher mystery into the light: he has a Cryptologia paper coming up, and plenty more work to do over coming months (or years?) fleshing out the behind-the-scenes story from the stack of Byrne’s papers now in the NCM. It’s a fascinating slice of cipher history, and I wish him the very best of luck with the inevitable book and selling the movie rights! icon wink The Chaocipher revealed!

* * * * * * *

Update: I’ve added a follow-on Chaocipher post here, discussing the intriguing parallels between the Chaocipher and chaos theory…

40 Comments

  1. avatar David in Phoenix July 3, 2010 11:27 pm

    This says a lot about individual creativity. Even without a computer we can achieve greatness.

  2. avatar Lloyd Miller July 4, 2010 12:38 am

    The obvious question is : would a software implementation be of any interest. It would make the error propagation weakness mute.

  3. avatar nickpelling July 4, 2010 7:20 am

    Lloyd: Moshe Rubin includes a complete Perl implementation of Chaocipher as an appendix in his PDF, so feel free to play with that. ;-)

    http://www.nickpelling.com/

  4. avatar ewanm89 July 4, 2010 12:34 pm

    Okay, other that the security through obscurity and using only one 26^2 keys (hence if I know the rotors I can brute force it on my mobile phone in seconds, hint this is orders of magnitude less than enigma which basically did the same with three rotors and pasing through them twice).

    Oh, and the same methods for cracking enigma would work here to decrease the number of possible keys. The only weakness of enigma one couldn’t exploit is the fact enigma could never encode to the same letter (so if this happened one knew what they did was wrong). Any message longer than 26 characters and I can frequency analyse this same way one would Vigenère cipher.

    http://ewanm89.co.uk/

  5. avatar nickpelling July 4, 2010 1:00 pm

    ewanm89: because the state of both rotors gets twizzled after processing each character, the initial state of both rotors affects the output fairly rapidly, hence there are (26*2)! (i.e. 8.06581752 × 10^67) initial keys to wade through. That’s quite a big number, even with a fast mobile phone. =:-o

    http://www.nickpelling.com/

  6. avatar Jason Carver July 4, 2010 5:23 pm

    Pretty cool! I was curious about this, so I built a 50-line Python version to confirm my understanding. Feel free to check it out and use it at bitbucket (LGPL)

    It could use things like a configurable key and interactive enciphering command line, and I would be happy to pull such changes.

    http://slique.com

  7. avatar Thomas July 4, 2010 5:37 pm

    I could be wrong, but I think the number of possible keys is 26!^2, which is ~1.6×10^53. Oh, and since you could rotate the two alphabets in synchrony without affecting the process, it’s (26!^2)/26, which is ~6.3×10^51. Either way, I suspect that’s too big to practically brute force.

    I wouldn’t be surprised, though, if it is breakable, now that we know the mechanism. Don’t forget we had the mechanism for the Enigma (not to mention a compelling incentive to break it).

  8. avatar Anon July 4, 2010 9:26 pm

    “However, I’m not 100% sure that has happened yet… so, interesting times.”

    Dood, you’re 0% sure. Either someone knows it has been done (in which case they would be 100% sure), or they’re some poor tool sitting on the sidelines with a desperate hope that a ‘three-letter-agency’ will drop them a tidbit. And, you sir, are the latter!

  9. avatar nickpelling July 4, 2010 10:21 pm

    Anon: Perhaps I could have phrased this more clearly – while intelligence agencies have definitely evaluated the Chaocipher a number of times over the past 90 years, it is not clear why they rejected it.

    http://www.nickpelling.com/

  10. avatar Moshe Rubin July 5, 2010 8:25 am

    Anon: You should really give Nick the benefit of the doubt . Nick, and anyone who took the time to peruse the Chaocipher Clearing House web site, is aware of the fact that I and others had submitted FOIA (Freedom of Information Act) requests to NSA to declassify any Chaocipher-related material they might have. In addition, Mike Cowan (a Chaocipher researcher) made a similar request from GCHQ in the UK. You can see their responses on The Chaocipher Clearing House web site:

    http://www.mountainvistasoft.com/chaocipher/chaocipher-010.htm

    http://www.mountainvistasoft.com/chaocipher/chaocipher-014.htm

    One can speculate whether NSA and/or GCHQ had attacked and solved Chaocipher over the years and are not admitting it, or whether they never felt it worth their while (which, in hindsight, would have been an error on their part). Whatever your take is, Nick is well within bounds by saying he’s not 100% sure.

    Hmm, I wonder if an apology is in order … ;-)

    http://www.mountainvistasoft.com/chaocipher

  11. avatar nickpelling July 5, 2010 8:41 am

    Moshe: I’m not expecting an apology from anyone from Chatswood, NSW, Australia who gives their email address as “anon@anon.com”. Life is too short to undo trolls’ tort! :-)

    http://www.nickpelling.com/

  12. avatar m_braedley July 5, 2010 3:00 pm

    There are (n-1)! permutations for each wheel. For n=26, that’s ~15.5e24 for a single wheel. For 2 linked wheels, it’s ((n-1)!)^2 plus a multiplication of n for the rotational combinations of the second wheel n((n-1)!)^2. This simplifies down to ((n!)^2)/n, and for n=26, is ~6.26e51. As a comparison, 2^256=1.16e77, although the usable AES key set is smaller.

  13. avatar nickpelling July 5, 2010 3:14 pm

    Mike: Yes, sorry – you’re right and so was Thomas – apologies! There are indeed only 26 choices per rotor, so the number of possible system states (less the 26 rotational duplicates) is (26! x 25!) = 6.25553856 × 10^51.

    http://www.nickpelling.com/

  14. avatar Gadget July 6, 2010 3:45 am

    Of course there’s two good reasons for intelligence agencies to reject such these. And several other not so good reasons. The first is that it’s not strong enough…

  15. avatar Gadget July 6, 2010 4:16 am

    And a possible reason two… While mechanically very challenging, digitally it can be quickly adapted to any arbitrary ‘rotor’ size, say byte or word lengths (256 / 65536) – with care taken in choosing the correct distance between zenith and nadir. (65536! x 65535!) = big-number. Off the top of my head this is in the neighborhood of 500K decimal digits.

  16. avatar Mark VandeWettering July 6, 2010 4:42 pm

    Even a cursory glance at the cipher reveals that it is certainly much weaker than the enigma, despite its much larger potential keyspace. I coded up a quick python implementation, and passed a file consisting of 2000 As to it. It immediately settles into a cycle whose period is only 13. The diffusion portions of the algorithm simply don’t work fast enough to avoid these short cycles. Having any sort of crib is probably sufficient to recover large portions of the cipher wheels. Note: I am just an amateur at this stuff, I suspect a real professional would just laugh at this cipher, despite its colorful history.

  17. avatar nickpelling July 6, 2010 5:01 pm

    Mark: I suspect that you’re over-extrapolating from a degenerate case that would simply not happen in practice, and drawing conclusions that are a touch unfair to Byrne’s elegant device. All the same, I’m eagerly awaiting a professional cryptological analysis of the Chaocipher…

    http://www.nickpelling.com/

  18. avatar Mark VandeWettering July 6, 2010 7:18 pm

    It’s a degenerate case to be sure, but one that is indicative of an underlying problem. Still, there are definitely features of interest inside the cipher: a few more tests that I’ve run indicate that when supplied with some trial text (I used my trusty copies of Dracula and the Adventures of Sherlock Holmes as fodder) the resulting histograms of single letter and digraph frequencies appear remarkably flat, and according to the simple tests which I coded up years ago, virtually indistinguishable from the same statistics applied to random text. The Chaocipher certainly seems pretty good at “whitening” English tex

    That being said, there are some other features that seem to be bad in the cipher, at least with respect to a chosen ciphertext attack. Let’s pretend we have a plain/cipher text pair of considerable length. Initially, we don’t know anything about the content of the cipher and plaintext wheels. But after we see the first character, we know one thing: that the corresponding letters were in matching places on the cipher wheel. We do not know what the position of these letters are. So, let’s generate 26 hypotheses about where they could be, and simulate them all in parallel. This moves our “knowns” around in one of 26 possible ways. Now, we get the next character. For each of the hypotheses, we can now generate an additional (at most) 26 hypotheses, because we have a new cipher/plaintext pair. Some hypotheses are not plausible however, because they require that we place a different letter in a position which is already filled. So, we reject that hypothesis. We don’t need to do this in parallel, we could instead do a traditional depth first search. The question is: how quickly can we converge to the right key? My guess is “pretty quickly”. Let’s say that at some point we have a pair of cipher/plaintext letters. If we’ve already placed that letter in one of the wheels, we only have ONE possible position to check. As we fill in bits of the cipher wheel, the resulting output becomes very quickly constrained.

    I’ll be pondering this as I walk around Disney World today (gadzooks, I need to get a life) and try to hack together an implementation soon.

    http://brainwagon.org

  19. avatar Carl Scheffler July 7, 2010 11:09 am

    I just managed to crack Exhibit 1! Will write up the solution and post a link to it soon.

  20. avatar anonist July 11, 2010 8:50 pm

    This really shouldn’t be hard to break.

    A single known plaintext/known ciphertext pair is all that is neccessary to instantly extract the key (the input output relationships reveal the exact positions of the alphabets on the circles). This is likely why they didn’t use this in the military.

    Thus, if you correctly guess a nice long word or phrase in one of the messages, you break the entire message (key). Something like aircraftcarrier or theunitiedstatesofamerica and the key is almost completely yours.

    The cipher has all sorts of bad quirks, like doubled letters never result in doubled letters in the output. Any doubled letters in the output are the result of a fixed distance, d, of the letters on the input.

    Guess that somewhere in the ciphertext two ‘the’s are n letters apart, where n < 30. If you ever guess correctly you should start to decode letters with the correct frequency. Rise/repeat.

    @ewanm89 I think we mean the key is (26! * 26!), which is according to google is 2^176, a very very big key size.
    http://www.google.com/search?hl=en&q=lg(26!+*26!)

  21. avatar Moshe Rubin July 14, 2010 9:23 am

    @anonist: (It’s hard to respond to an anonymous entity, but here goes …)

    I found your posting amusing. Byrne provided 13,615 matching pt/ct pairs in Exhibit 1 of “Silent Years”. Even with all this material it was a major challenge for talented amateur cryptanalysts today (e.g., C. Scheffler above), even knowing the algorithm, to discover the starting alphabets.

    I can’t let your hasty posting pass without a comment, so I have to take you up on your challenge. Here it is:

    (*) Focus on Exhibit 1
    (*) You are given that the first pt/ct pair is A/C.
    (*) Do not make use of any other plaintext.
    (*) Solve for the starting alphabets.

    To make life easier I tell you that the first 55 plaintext letters are “ALLGOODQQUICKBROWNFOXESJUMPOVERLAZYDOGTOSAVETHEIRPARTYW” (this beats two “the”‘s < 30). Now find the starting alphabets.

    The ultimate hint I'll give you is that the first 100 groups of 55 letters have the same plaintext of "ALLGOODQQUICKBROWNFOXESJUMPOVERLAZYDOGTOSAVETHEIRPARTYW". If you can't find the starting alphabets now …

    The quirk of doubled pt letters never resulting in doubled ct letters doesn't seem quite as bad as Enigma's pt letter != ct letter.

    The proof of the pudding is when armchair cryptanalysis meets the cryptogram.

    http://www.mountainvistasoft.com/chaocipher

  22. avatar Carl Scheffler July 15, 2010 9:15 am

    So it turns out that the trick does lie in the pt pairs. It is true that a pt pair never enciphers to a ct pair. What is more useful is that a pt pair enciphers to neighbouring ct characters. So when we see, for example, that “LL” enciphers to “GT” we know that G and T are adjacent in the cipher alphabet. The problem is that this is true only for a short while. After permuting the alphabets a couple more times the adjacency disappears again.

    So the *real* trick is to find sequences of characters that can be used to extended what we know about the alphabets, using characters that we have already observed. An example again: In the sequence (pt: “AABA”; ct: “EFEG”), the first two characters of the plaintext tell us that E and F are adjacent in the cipher alphabet. The third character in the ciphertext is E, which we have seen before and it tells us where B is in relation to A in the plain alphabet. Etc.

    This is how I cracked Exhibits 1 and 4. As mentioned in an earlier comment, I’ll post a complete solution online. Unfortunately work has been getting in the way.

    “I love deadlines. I like the whooshing sound they make as they fly by.” –Douglas Adams.

  23. avatar Moshe Rubin July 15, 2010 1:36 pm

    Hi Carl,

    Early on, Chaocipher researchers noticed that, at least in Exhibit 1, a (pt,ct) pair never occurs less than nine (9) positions later. This was a major factor in trying to (incorrectly) deduce the underlying system. This phenomenon, however, is not true for Exhibit 2 and 3, leading one to believe that they use a variant of the vanilla Chaocipher algorithm.

    If I understand your description of finding sequences of characters that can be used, you may be interested in a related posting of mine to the Crypto Forum. Is this the idea you had in mind? I used it for reducing the computer time needed to read Exhibits 1 and 4.

    I look forward to reading about your personal method for reading the exhibits. Once that’s out of the way, the challenge will be solving a message given only the ciphertext, or a small crib. That will enable us to truly evaluate the cipher system.

    http://www.mountainvistasoft.com/chaocipher

  24. avatar Carl Scheffler July 15, 2010 10:29 pm

    Hi Moshe,

    Indeed, I followed very nearly exactly the same method as in your post: searching for the sequence of characters with the fewest holes and searching through the possible ways of plugging the holes. I see in your post that you found the smallest number of holes to be 5. With a very slight modification in the method, it is possible to do it with only 4 holes. This is not so important for Exhibit 1 (a very long text) but could be important for shorter texts where we have a smaller chance of finding good sequences of characters.

    In my version, I search not only forward in the text for characters that have already been seen, but also backwards. This provides a better chance of filling in more of what we already know about the alphabet. To search backwards in the text it is necessary to unpermute the alphabets, i.e. make the opposite permutation of what is required for searching forwards in the text. Other than that the search remains the same.

    My only remaining secret is how to find the keyword to the cipher from the initial alphabets. I’ll describe that soon when I write up the rest of the method–hopefully between tomorrow and the weekend.

    Then there is also the problem with Exhibits 2 and 3. As you wrote, they do not follow the same statistics as standard Chaocipher texts. I have tried out some pet theories on how they might have been modified, but no luck so far.

  25. avatar Moshe Rubin July 17, 2010 7:37 pm

    Hi Carl,

    My algorithm iterates through the pt/ct, starting with offset 0, and computes the # pairs needed from that point forward. It keeps track of the best match, i.e., the one requiring the fewest holes. Could you please tell me what your pt/ct was that gave only four holes? I’d like to see why my program (a relatively simple one) didn’t catch it.

    Thanks,

    Moshe

    http://www.mountainvistasoft.com/chaocipher

  26. avatar Moshe Rubin July 17, 2010 7:41 pm

    Hi Nick,

    Can you already place this and the “Chaocipher got Slashdotted” entries under the “Chaocipher” reference page? They’re about to scroll off the main page … !

    http://www.mountainvistasoft.com/chaocipher

  27. avatar NyteOwl July 18, 2010 8:08 pm

    Fascinating! Regarding FOI requests, I wonder if anyone has tried the Canadian Communications Security Establishment (NSA/GCHQ equivalent) to see what they might have on file.

  28. avatar Moshe Rubin July 19, 2010 6:46 am

    Hi NyteOwl,

    Many thanks for that original direction! Although Byrne probably did not contact CSEC directly, they may have tried to solve Chapter 21 on their own. Certainly worth a try!

    I was not able to find any reference to “freedom of information” etc. regarding CSEC. Nonetheless, I will schedule this as a task for the near future.

    Thanks!

    Moshe

    http://www.mountainvistasoft.com/chaocipher

  29. avatar Carl Scheffler July 21, 2010 8:16 pm

    Hi all,

    My write-up of breaking Exhibit 1 is finally done (with lots of pretty pictures!). The next installment will be on breaking Exhibit 4.

    http://www.inference.phy.cam.ac.uk/cs482/projects/chaocipher/

    Any comments or corrections welcome.

    Carl

  30. avatar Moshe Rubin July 22, 2010 5:59 am

    Carl,

    You’ve done a fantastic job, complete with clear diagrams and description — true “eye and mind candy”! I am planning on posting links to it from The Chaocipher Clearing House.

    Some minor comments:

    (*) For the record, Byrne’s original mechanical model used two non-concentric disks, with the disks being engaged along the edges, and turning one turns the other (when engaged) as seen on the National Cryptologic Museum web site. Nonetheless I, like you, found it easier to describe a slightly different but equivalent model.

    (*) You warn would-be solvers of transcription errors in Exhibit 1’s ciphertext text file on TCCH. Just the other day I realized that, although I posted a list of the nine typos originally found in the TCCH ciphertext text file back in June 2009 (due to OCR errors), I had forgotten to upload the corrected file to the site (mea culpa!). As of yesterday, the plaintext and ciphertext files of all the exhibits are cleaned and authorized for use. You may want to remove the warning section, as it is no longer relevant (please double-check me!).

    (*) In your pages you bold numerous phrases. The letter coloring (a light blue) is identical to that of links (without the dotted underline denoting a link). I found that a bit confusing. Would you be able to bold in black to avoid the visual confusion?

    (*) If I understood correctly, you find the second candidate keyphrase (“THIKKTBNDB”) by running your keyphrase enumerating program again, this time using the left-hand alphabet for locating the ‘plaintext’ letter.

    An alternative way of arriving at the second candidate keyphrase “THIKKTBDNB” is by just running the first one, “TILNOYHIVK” through Chaocipher in the standard way (i.e., using the right-hand alphabet to locate the ‘plaintext’ letter), with the final ‘ciphertext’ being “THIKKTBDNB”.

    (*) In the “Results” section you refer to the plaintext and ciphertext disks (“It is also possible that the key was not meant to be applied to the plain-text ring (i.e. enciphered), but to the cipher-text ring (deciphered)”).

    A reader may be confused when directed to find a plaintext letter on a ciphertext disk. In my original paper I refer to a “right-hand alphabet” and “left-hand alphabet”, thereby decoupling the disk from being pt or ct. You might want to consider referring to the disks as “inner” or “outer” or some other pt/ct-agnostic description.

    All-in-all, you’ve done a most excellent job. I look forward with great interest to your upcoming articles!

    Best regards and kudos,

    Moshe

    P.S. I have a paper (to be uploaded any day now) describing my deciphering of Exhibit 1, which also includes some historical background, etc. But your article is true “eye-candy” — wonderful to read !

    http://www.mountainvistasoft.com/chaocipher

  31. avatar Moshe Rubin July 22, 2010 6:08 am

    Carl,

    FYI I added a link to your article on Wikipedia’s Chaocipher page. Feel free to edit it in any way you see fit.

    Moshe

    http://www.mountainvistasoft.com/chaocipher

  32. avatar Carl Scheffler July 22, 2010 10:32 am

    Hi Moshe,

    Thanks for detailed and useful comments!

    On comment 2: I ran your updated ASCII files of Exhibits 1 and 4 through my code and they both decipher without any transcription errors now. We can only hope that the typesetters did not make too many errors in Exhibits 2 and 3.

    On comment 3: My explanation of how to get the so-called ‘cipher-text’ key was incorrect. I had originally done as you suggested and simply derived the cipher-text key from the plain-text key. I updated the article to make this clear.

    On comment 4: Agreed, the use of agnostic terms to refer to disks/rings are better: article updated.

    Thanks again,

    Carl

Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>