Elias Schwerdtfeger’s “biological paradox”…

Posted by nickpelling on Apr 1st, 2009

For a couple of weeks, I’ve been meaning to post about German Voynich blogger Elias Schwerdtfeger and what he calls the VMs’ “biological paradox”. His question is simple: why is it that the Voynich’s “biological” Quire 13 has both (a) complicated pictures of nymphs, tubes and baths, and (b) longwinded, redundant text? Surely, he asks, isn’t this combination somewhat paradoxical?

(To be honest, Elias’ post then goes off on a bit of a wild tangent: but given that it’s a good starting point and the whole issue of Q13 is a favourite of mine, I thought I’d step up to the line.)

Page f78r (one of the few that Leonell Strong was able to examine) has a number of good examples of this redundancy, in particular para 1 line 5′s “qokedy qokedy dal qokedy qokedy“, for which Strong’s 1945 worksheet #2 suggests the decryption “DUCTLE ROULLS THE GRAOTH COEMLI”.

This is the same piece of ciphertext about which Gordon Rugg asserted “This degree of repetition is not found in any known language (Sci Am, 2004). Of course, linguist Jacques Guy ferociously responded to this Ruggish in sci.lang firing off real-life counter-examples such as “di mana-mana ada barang-barang. Barang-barang itu…” As always, there’s a fair degree of truth in what both are saying: but the fact (as Elias points out) that only some parts of the Voynichese corpus read like “qokedy qokedy” is a pretty good indication that we can’t reduce this debate to an either-or between these two opposing poles. Essentially, it can’t be just a simple repetitive language if it’s not consistent throughout (and it isn’t): and beneath all the cryptographic window-dressing, there probably is some kind of meaningful language thing going on.

I’d say that Mark Perakh’s (1999) tentative conclusion on the language differences probably yields the most useful key to Elias’ paradoxical door. Mark wondered about the internal structural differences (i.e. within words) between Voynich Manuscript A and B language pages (and all the text that shades between A & B) and so carried out some tests: ultimately, his favoured explanation is that the A language is a more abbreviated & contracted version of the B language, but that beneath it all, they are still both expressions of the same thing. (Though Mark points to contraction probably being the main mechanism used).

So, the text in Q13 – as a B language object – therefore exhibits redundant probably because it is more verbose. This suggests that we should be looking to decipher the B text, simply because we stand less chance of being distracted by the A text’s arbitrary contractions.

My own take is a little more nuanced (though still hypothetical, lest I raise the hackles of the hypothesis police once more). Firstly, I suspect that the A pages were written first, and that these were trying to duplicate an existing document using a verbose cipher – meaning that a ciphertext line wouldn’t map to the same physical space as a plaintext line. The only way to fit it in was to aggressively abbreviate & contract… but this helped make the ciphertext more opaque.

Then, I suspect that the B pages were added, using smaller quills (say, eagle’s feather?) – because the smaller letter sizes took the pressure off the overall line lengths, the need for contraction and abbreviation was reduced. However, I think some aspects of the coding system changed (specifically the steganographic numbering scheme, but that’s another story!), making the B pages harder to break in a different way.

That is, I suspect that we have two types of ciphertext present in the VMs: a simpler cipher system A (but with a significant amount of contraction and abbreviation) or a more complex cipher system B (but with less contraction and abbreviation to distract us). And just to make things really difficult, there are probably system B pages that are also heavily contracted (i.e. the worst of both worlds).

And some people still wonder why computers can’t break the VMs! *sigh*

16 Responses

  1. Vytautas Vytautas Says:

    Hi, Nick
    what about RNA and DNA systems (as per Stewe Ekwall: “Plants are RNA, humans are DNA…:) Something similar…

    Vytautas

  2. Nick Pelling nickpelling Says:

    Could be true… I think the last word has yet to be said on Steve Ekwall… :-)

  3. Vytautas Vytautas Says:

    And yet two thougthts by the way :
    1) Voynich manuscript may be book about cryptography written enciphered and all illustrations concerns enciphering methods used by author or by his coevals… Yet one hypothesis.
    2) Next thought a bit not correct, but… There is possibility Steve Ekwall is not real person… Aka pseudonyme of someone of list members wanted to say different opininion and test it :)

  4. Nick Pelling nickpelling Says:

    1) This has been suggested a number of times, though it doesn’t really seem to cover why there are both “herbal” pages (without much text) and “recipe” pages (full of text) – unless all those drawings actually mean something (in some way), it’s a fairly dramatic waste of space. :-)
    2) Having spoken with Steve Ekwall, I can assure you that he’s a real live person (and very friendly, too), rather than some invention of a list troll. The more you know about Steve, the harder it is to explain what he thinks happened to him. :-o

  5. Emily Emily Says:

    By the way, which language is the one Jacques Guy cites as full of repetition? (Are the reduplicated words plurals?)

    I think that if you designed a computer that could decipher the Voynich– using both cryptanalysis and historical/cultural/scientific knowledge– that computer would have to be of near-human intelligence and versatility(a universal Turing machine, if I have my jargon right).

  6. Nick Pelling nickpelling Says:

    You’d have to ask Jacques…

    And you’re right: it’s a human problem, not a computer problem. For the moment! :-)

  7. infinitii infinitii Says:

    That example from above is Indonesian, but I think Jacques also knows that Chinese is full of repitition as well — as for what those reduplicated words mean, no idea.

  8. Nick Pelling Glen Claston Says:

    [Note: GC submitted this comment via email, please excuse me if it ends up with the wrong avatar!]

    No problem here, except that the actual quote is “sutli ductle roulls the graoth coemli”. As odd as that may seem, it is derived mathematically, not through any linguistic means, and has a linguistic basis. A mathematical approach to the text is the only appropriate means of attack.

    It’s easy to speak of contracted or abbreviated text, but that doesn’t match up with other observations. Word length statistics in Voynich-101 are in line with a language that falls between Latin and English, so abbreviation of say, one glyph per word would expand these statistics beyond known European languages. The EVA alphabet already does this expansion, depending on how carefully you choose to view the related glyphs, and EVA transcriptions have no match to any language, surviving or otherwise. The herbal-A texts have a well-established base of 17 primary glyphs, this is slightly expanded in the B texts. The additional part of what we would consider a correct linguistic distribution are absorbed by the overuse of glyphs such as “o” in both the A and B texts. Linguistically speaking, in the A texts, three specific floating glyphs take up the missing portion of what would otherwise be considered a normal alphabet distribution. In the B texts this is expanded to include the “c”, which can be considered to be floating “vowels”, their definition most usually falling within the vowel range. Consider the most common “word” in both cases, the word “8am”, how can this be verbose? How can the multiple instances of “8″, “9″, and other glyphs as stand-alones be considered verbose? There exists a very large argument against this type of thinking.

    Yes, I could pick your whole argument apart, but that wouldn’t do any good because you reach conclusions that have no basis in fact. The bottom line here is that you know nothing about the cipher construction, and because you don’t, you are willing to hypothesize based only on the erroneous hypotheses of others. Yes, you’ve once again been busted by the hypothesis police.

    Now let’s look at reality. The differences in script translate to a line length that differs between 37 glyphs to 50 glyphs, it’s actually an average difference of 9 glyphs between the most compact and the most verbose of scripts. The longer script examples make use of narrower margins, and the average height and width of the individual glyph varies only slightly, so the perception that some pages are more compact than others is partially illusion. It could be something as simple as the author only having so much to say in a space, so he fills that space a bit larger than the other, extending the margins, nothing more – just an hypothesis.

    The “eagle feather” gives me a laugh. The travelling Jesuits were known for producing and selling small pocket bibles made of fake uterine vellum, and these were written using stiles, not quills. Such small print using a split stile, print much smaller than that used in the VMS. There are examples of these Jesuit bibles out there in the ether, very good examples actually. The question the “eagle feather” assertion brings up is one of available technology, and the Jesuit bible example demonstrates that the technology to write minutely was available though not used in the VMS, since the average height of VMS text approximates our modern 12 point type. Do you see why the “Hypothesis Police” exist? I haven’t even addressed your cipher hypothesis presented here, I’ve spent all my time addressing the underlying or “hidden” hypotheses that apparently direct you toward the fallacious conclusions you’ve reached. My usual take would be to ignore this post as that of an uninformed crackpot that doesn’t know his ass from a hole in the ground. To me an hypothesis must have some basis in supportable fact, not just hearsay from some other researcher who may be misinformed.

    So this pisses you off? Well that’s a good start. Tell me what makes you angry, and I’ll tell you why I think that way, in living color if need be. Let’s throw all that other shit out the window and get to the facts, that’s where it’s all going to happen anyway. That post you just made could have included a good deal of usable information that said something others could use as guidance. At one point or another we both need to be working off the same fact sheet, and we’re clearly not yet there.

  9. Nick Pelling nickpelling Says:

    In Strong’s worksheet #2, “qokedy qokedy dal qokedy qokedy” is transcribed as “qotedy qokedy dal qotedy qokedy” (which is incorrect). And even in these five words, there are cipher inconsistencies:-
    DUCTLE ROULLS THE GRAOTH COEMLI
    qotedy qokedy dal qotedy qokedy
    797531 474135 797 531474 135797
    In alphabet #3, letter ‘o’ is deciphered as both ‘R’ and ‘O’: in alphabet #7, both ‘o’ and ‘d’ decipher to ‘O’. Please don’t read this as saying you’re right or you’re wrong about backing Strong’s cryptographic horse – as Jim Gillogly pointed out to Strong himself 30 years ago, it is hard to reconcile the kind of frequency-flattening cryptography Strong wrote about with strongly-structured sections (of which “qokedy qokedy dal qokedy qokedy” is perhaps one of the most strongly structured).

    As far as contraction & abbreviation goes: I suspect that people who don’t subscribe to Strong’s decryption would do well to read Mark Perakh’s paper – whether words are longer or shorter than English or Latin all comes down to what the correct level of transcription / tokenization should be, and we have substantially different views on that… but that’s OK. As for “dam”, my guess is that it is formed from two tokens (d + am) and that it codes for “&x”, i.e. ‘etc’. As for ’8′ [d] and ’9′ [y], they are probably not verbose: but I never said every letter had to be, did I? :-) And the Society of Jesus formed in 1534, so your travelling Jesuits and their split stiles might well need to have been time-travelling Jesuits too, depending on when the VMs was written. :-)

    However, trading monkey faeces is all a bit pointless: unless you have swallowed a month’s worth of be-really-dogmatic pills in one hit, it ought to be pretty clear that there exists very large arguments against all types of thinking, VMs-wise – we’re none of us immune to criticism. The question is how to move forward – what that “fact sheet” would look like.

  10. Marke Fincher Marke Fincher Says:

    Hi Nick,
    I’m fairly surprised that you posted GCs email on your blog given how rude parts of it are. I fail to see really what there is ever to gain by being abusive and insulting and I wonder if people resort to such behaviour when they cannot really rely on the strengths of their arguments alone.

    It’s the sort of “uncivilized” (and pointless) behaviour that unchecked hastened the demise of the Voynich Mailing List and I hope you can protect your blog from similar pollution.

    But leaving that aside; my take on this is:

    Several results of my analysis into the distribution and density of variation lead me to believe that the degree of contraction and abbreviation present in the underlying plaintext of VMS-A and B is not significantly different.

    If a much higher degree of contraction and abbreviation were used in B versus A then you would expect to see less redundancy and a denser, more compact distribution of variation.

    I do agree with you that the ‘A’ pages came first BTW.

    Marke

  11. Nick Pelling nickpelling Says:

    Hi Marke,

    Glen is particularly passionate in his defence of Leonell Strong’s claimed decryption against any perceived criticism of it – and if sometimes he goes over the line, I don’t mind too much as long as he backs up his arguments (something notably absent from the mailing list).

    Contrary to what a statistics professor might tell you, entropy, redundancy, and structure are all extraordinarily subjective terms because they rely on your having got the transcription and tokenization right in the first place. What is perhaps lacking most in the “language” debate is someone to step forward and map out precisely how B text differs from A text: this is because things like the presence of free-standing ‘l’ (a strong B feature, I’m pretty sure) alter all the stats – but is that ‘l’ really the same letter as the one in ‘al’ and ‘ol’?

    The model underlying your stats probably presumes that they are all the same letter, but I’m sure you can see how much would be altered if that presumption (and the 20-30 similar ones relating to tokenization) were wrong.

    Cheers, ….Nick Pelling….

  12. Das Voynich-Blog » Blog Archiv » Der Prozess, der es schwierig macht Das Voynich-Blog » Blog Archiv » Der Prozess, der es schwierig macht Says:

    [...] greetings to Nick Pelling and thank you. My english may be bad, but not as bad as a google translation… Tags » Autor: [...]

  13. Marke Fincher Marke Fincher Says:

    Your point about information theory and how any analysis from it only relates to the specific transcription (or act of extraction) of the information, which of course can be inaccurate, is an important point for sure Nick, and it should always be in our minds somewhere when interpretting any result, but I think it is possible to overstate the magnitude of this practical problem, as if it has the power to render all results meaningless, make all analysis futile and change black to white and make a pencil look like a hippopotamus! But it’s not like that.

    if there are distinctions made in the transcription which were not intended then what you are analysing includes a “noise” which dilutes the signal, but such a random noise will flatten and weaken relationships and not produce artificial ones of any strength. So strong relationships that you do find and study can not be complete “mirages” produced by an inaccurate transcription.

    Distinctions that were intended but missed by the transcription are more tricky perhaps, but mostly just means that you are looking at the thing through a fuzzy lens and missing some of the detail as a result….but what you can learn from the detail that you can resolve is not meaningless and is still very useful. And most uncompressed information is surprisingly error tolerant.

    So we should be careful in stating the problems of transcription to not pull the Rugg from under the feet of people who are willing to put effort into analysis, which is pretty hard work for the most part but generally very worth it and almost certainly the only thing that will lead to a decryption (although perhaps not on its own).

    Marke

  14. Elias Schwerdtfeger Elias Schwerdtfeger Says:

    Of course we have a problem. We simply do not know enough about the writing system used in the VMs, and we know nothing about the underlying language. (Not even that there is an underlying language.)

    In every transcription system is a whole set of assumptions about the nature of the writing system, and when I do some analysis in the transcriptions, I try to use the transcription alphabet of which I think it fits best in the matter of my analysis. For things like word-length distributions I feel (yes, it isn’t scientific) EVA as inappropriate and transform the transcription mechanically to Currier instead, which gives a much better match between ASCII characters and the things in the script I believe (it isn’t scientific again) to be a “letter”.

    I am a little familar with the great Voynich 101 transcription and its underlying assumptions. Some of them are worth to be considered, especially the diffrent florishes in the EVA “sh”-glyph, and some other seems (insert my default bracketed phrase here) a little bit too finetuned to me.

    But to see the thing I called the “biological paradox”, one don’t need any transcription at all, only a hires or midres image from the Beinecke website. It is obvious, even to someone who see the page for the first time. (Even a friend of mine which know nothing about the VMs is able to see it.) There are lots of similar formed “words” on the pages in the biological section, much more than in any other section of the Ms. The degree of similarity may differ with the transcription system used, but the fact will remain: The section of the VMs which contains the most disturbing and unique kind of illustrations seems to contain of the most redundant kind of “text”. This is really weird.

    It is not the kind of “hard stuff” I wanted to post to the mailing list, but it had inspired me to do some (explicitly called as such) speculations about a “psychological feedback process” similar to parapsychological (this is a good word to remove the last similarity to things one can call scientific without becoming ridiculous) “automatic writing”, which I posted in my german blog. The speculative stuff is unripe and may not worth further research at the moment. But the “biological paradox”, as I named it, remains…

    And — of course — please excuse my bad english. But I hope it is easier to understand than a google translation of my — sometimes rather “cryptic” — german. ;-)

  15. Nick Pelling nickpelling Says:

    Hi Elias,

    I liked your “biological paradox” idea very much (qokedy qokedy dal qokedy qokedy, indeed!), and it seems to have inspired plenty of commentary here, which is good – all food for thought. ;-)

    I think I should post soon about how transcription and modelling strategies affect the statistics so massively, it’s a big topic that nobody seems to cover properly…

    Though my grasp of Geman is patchy (for my sins in a past life, I chose to learn Ancient Greek instead at school), Google Translation is normally good enough to get me 50% of the way there (though I do have to work to get the other 50%). But surely you have much better things to do than pore over your server logs to see when I’m dropping by? :-)

    Cheers, ….Nick Pelling….

  16. Elias Elias Says:

    [...] Google Translation is normally good enough to get me 50% [...]

    Sounds similar to me reading dutch. I understand 50%, but I do not understand 100% after reading the text twice… ;-)

Leave a Comment




XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Please note: Comment moderation is enabled and may delay your comment. There is no need to resubmit your comment.