I mentioned in a previous post that I thought that the Scorpion S5 cipher’s numerous shape families might offer a backdoor into its cipher system, if they just happened to be elegantly arranged on downward diagonals. I pointed out that if this were correct, the “dice” shape family that appears in columns 1, 3, 4, 8 (twice), 9, 12, 14, 15 would be most likely to have been arranged such that A was 1, C was 3, D was 4 (and so forth).

However, I didn’t actually get so far as calculating the precise probabilities in that post: but now I have (I think).

In my Scorpion spreadsheet, the total probability that a specific family was enciphered as a specific sequential set of letters is calculated as the product of each individual letter’s likelihood. By ‘likelihood’ here, I mean not the probability of that letter occurring randomly (i.e. P, its raw instance probability), but the chances of that occurring exactly N times within a column of letters of height H. And in Excel, you calculate this function using the in-built function ‘BINOMDIST(N, H, P, false)‘. (Note that instead using ‘BINOMDIST(N, H, P, true)’ would calculate the cumulative likelihood of that happening, i.e. the chances of that probability P event happening 0 times up to N times out of a maximum of H times.)

For the raw instance probability values, I used the Scorpion encipherer’s plaintext as a reasonable approximation of the text we are likely to find encrypted inside the S5 cipher. I think there’s a pretty good chance that it will be good enough.

As for the height H: once you have rearranged the message according to the 16 apparent columns of the ciphertext, columns 1 to 4 contain 12 instances each, while for columns 5 to 16, each on contains 11 instances. All of which means that the binomial probability table for N out of 11 looks like this:

binomial-probabilities-11

For example, even though the raw instance probability for ‘E’ is 11.35%, the chance that a given 11-high column of letters will contain exactly one ‘E’ is 37.4271% (or so my spreadsheet says, anyway).

But rather than limit the calculation only to length-16 families, I added a trick whereby shorter families can be checked against other diagonals in the cipher table. If you use the number 99 as the count for an individual family’s column, the spreadsheet works around it in the calculation, by allowing the shifted alphabet to start not at ‘A’ but at ‘z’ (i.e. ‘A – 1’).

I’ve included 11 shape families from the S5 cipher: if you copy a row from any one of these across to row #33, the spreadsheet will calculate a composite ranking value for each of 28 different offsets in column U (the ‘Result’ column). This is equal to the final probability times a million (or else the numbers would be too small to be practical).

For example, the relative rankings for the dice family are:-

2.737265 A
0.013655 B
0.000000 C
0.000046 D
0.293415 E
0.018483 F
0.093272 G
0.000451 H
0.000078 I
0.000000 J
0.009360 K
0.074230 L

Here, the ranking for ‘A’ (2.7372765) is nearly 10x the ranking for second placed ‘E’ (0.293415), which is essentially what my initial imprecise guess was (thank goodness). 🙂

It’ll take a while to figure out what this all means, but I thought I’d post the basic spreadsheet sooner rather than later. 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Post navigation