Montemurro Voynich paper and the “genuine message”…

The news rattling the bars of the Voynich research cage loudest right now is surely the publication of a paper by Marcelo Montemurro and Damián H. Zanette called Keywords and Co-Occurrence Patterns in the Voynich Manuscript: An Information-Theoretic Analysis, deftly summarized in New Scientist as New signs of language surface in mystery Voynich text.

M&Z’s abstract brings out a lot of what they were trying to do – and also points exactly to their mistakes.

Here we analyse the long-range structure of the manuscript using methods from information theory. We show that the Voynich manuscript presents a complex organization in the distribution of words that is compatible with those found in real language sequences. We are also able to extract some of the most significant semantic word-networks in the text. These results together with some previously known statistical features of the Voynich manuscript, give support to the presence of a genuine message inside the book.

Central Assumption: the authors implicitly hypothesize that they can get meaningful results for long-range comparisons because Voynichese is homogeneous across all its sections.

…The Problem: this assumption is false (or very nearly so), because there are significant macro-level differences in the way the language in different sections works (Currier A, Currier B, labels) as well as many mid-level differences (Herbal-A, Q13-ese, etc).

Central Conclusion: the authors believe that their language-centric statistical machinery has identified “The thirty most informative words in the Voynich manuscript”.

…The Problem: I’m pretty sure that the authors have in fact very probably identified arguably the thirty least informative words in the Voynich Manuscript. (That may be an independently useful result, but it’s probably not really what they were hoping for.)

I’ll explain.

Voynichese is extremely predictable at a letter-level: it has many rigid letter-level adjacency rules (’4′ is almost always followed by ‘o’, etc) and position rules (4o- is consistently word-initial, -89 is consistently word-final, etc) and a high level of letter-context predictability.

Yet at the same time, it also has a very large dictionary relative to its text size. I often criticize Gordon Rugg for suggesting historically incorrect Cardan grille-like tables (i.e. they’re a century too late for the Voynich’s construction dating) and for inappropriately back-projecting his modern CompSci mindset onto the early Renaissance (i.e. it’s 500+ years too early for the kind of table-driven hackery he proposes). However, he is absolutely right that a reconstructed Voynichese “dictionary” would, to a modern computer scientist’s eyes, look very much as if it had been generated or permuted by some means.

The paradox is therefore that these two apparently opposite aspects of Voynichese are able to coexist: how on earth can we reconcile its letter rigidity & predictability with its wild word variability?

I think the key to resolving this is to grasp that there is some kind of generative or confounding principle at work within a rigidly predictable framework. That is, that even though there are lots of rules, these rules act as a kind of “container” for semantic or cryptographic variability to exist within.

Hence I believe that Montemurro’s statistical machinery is identifying “words” that fall within the container layer rather than in the confounded content layer. Hence these are arguably the thirty least informative words in the Voynich Manuscript.

It’s a hard point to understand, let alone accept: the confounding trick (some kind of transposition cipher? some kind of paper cipher machinery? some kind of cipher wheel?) driving Voynichese’s inherent variability remains as profoundly unreachable now as it has been for over 500 years.

My apologies to Montemurro and Zanette, but the central challenge we face isn’t to find new language-based statistical tests to apply to the Voynichese corpus, however clever they may be. Rather, it is to find ways of resolving the Voynich Manuscript’s central paradox: how is it that Voynichese is both letter-rigid and word-variable at the same time?

Incidentally, M & Z conclude in their paper that results point to a semantic link between the Recipe and Astro sections, and between the Herbal and Pharma sections. Actually, had they been more aware of the codicology analyses that have been done, they would have seen that their results are consistent with the writing phase order.

In fact, there are many indications that what I call Voynichese’s ‘container’ layer above evolved during the writing, with the most obvious evolution being between Currier A and Currier B. I suspect that what their statistical machinery has imperfectly captured is therefore simply a snapshot into the evolution of the container layer, and not anything ‘semantic’ as such.

In short, the aspect of Voynichese that is most nearly homogeneous across all its sections is its “container” layer: so what Montemurro and Zanette have done is make long-range comparisons between evolutions of the container layer. Currently, my best guess is that these are likely to be almost entirely composed of cipher system meta-tokens (shorthand tokens, transposition cipher placeholders, etc) rather than the semantic contents, which appear instead to have been confounded by some means.

So, rather than finding a “genuine message” (as New Scientist put it), perhaps they have instead found a “genuine container” for the message? This may prove to be a very useful result in its own right, but it’s probably not the smoking gun linguistic proof they were hoping to use to discredit Rugg’s tables.

38 Comments

  1. avatar Warlock Asylum June 26, 2013 4:59 pm

    Very informative article. I looked into the manuscript several years ago, but my time was consumed by trying to understand the Vasuh language, which is a numerical language. It may be the same for the Voynich manuscipt.

    Stay Blessed

    http://www.warlockasylum.wordpress.com

  2. avatar Knox June 27, 2013 2:30 pm

    Nick, you are banging all around the “paradox”. Let’s get rid of it in one sentence. Word variability in Voynichese compared to known text is achieved by more relaxed rules for nGram sequence within a word and by the short edit distances of the words. The latter is partially affected by the former. There may be additional variability caused by scribal errors.

    http://notakrian.pbworks.com/

  3. avatar nickpelling June 27, 2013 6:31 pm

    Knox: …but that doesn’t resolve the paradox.

    Unless something is actively confounding the word generation process, we would expect to see exactly the kind of long-range word matches that Montemurro and Zanatte see, only far far stronger. So the paradox remains: the ngram stats are merely symptoms of whatever the underlying process is, not the process itself.

    http://www.nickpelling.com/

  4. avatar Knox June 27, 2013 9:01 pm

    OK. I think I get the point. Now bear with me while I convince myself by analogy that there is no meaning in the VMs.
    There was a book of 3-digit numbers. On the chance that The Book might have meaning, curious people had a closer look. Someone discovered that the hundreds digits were all odd, the tens digits were all even, and the units digits were both even and odd. Another person wrote single digit numbers on scraps of paper. He placed a number of paper scraps with odd digits in a box (Box 1), papers with even digits in Box 2, and papers with both even and odd digits in Box 3. The pieces of paper are drawn one at a time from the boxes in turn.
    What does the drawing tell us about the original numbers?
    Hmmmmm … ummm… hmf..
    Help me with this, will you?

  5. avatar nickpelling June 27, 2013 9:57 pm

    Knox: that’s more to do with distribution stats. What gives Voynichese its low entropy is its letter-to-letter adjacency rules. But these rules are then finessed by yet more section-specific rules, such as the way word-initial y+gallows appears disproportionately often (as I recall) in the label section.

    http://www.nickpelling.com/

  6. avatar Knox June 28, 2013 1:24 am

    Yes (for h2). But look again; I was picking on Gordon Rugg in the second message. Have to give him credit for a lot of thought and work and he was looking for the same thing we all are. I should say all _were_, until some found _it_. I regret the word “hoax” in the name of his theory. We cannot know the intent of the creator and it is not necessary to what he proposes. “Hoax” will continue to be misused.

    Interesting question by Klaus (the pingback). I’ll have to give my impression since I can’t read German. Can Rugg-style generated text show the several characteristics of language found in the VMs, especially by someone who didn’t know those characteristics existed? Back to me: That question is valid for ciphers other than simple substitution and transposition. I think we should make a list of the language-like properties and try to answer the question.

    A word transposition that was done in the same manner as letters in a railfence cipher would retain many language characteristics and negate others. Now we have to explain the abnormal edit distance of the lexicon as recently described by Job. It’s substitution. What is the nature of the substitution?

  7. avatar Diane June 28, 2013 5:18 am

    Nick – I might mention that when I linked to Klaus’ site through the pingback on your page it was ok,
    but when I tried to go directly, I received a MAJOR alert from my browser, saying it is an ‘attack’ site.

    Klaus’ is a blog I’ve visited fairly often, so this is a new development.

  8. avatar nickpelling June 28, 2013 11:31 am

    Knox: there’s only evidence of rail-fence transposition from the mid-16th century onwards. So that’s probably not what we are looking at here.

    When people talk about transposition ciphers in the context of the fifteenth century, it’s normally either syllable reversing (as in the “Da Vinci Cipher” chapter of my book) or “strewing” (as alluded to by Alberti in his little book, that I also describe in my book), which means moving letters around the page.

    The example I normally give is of “Neal Keys”: one working hypothesis posits these as blocks of meaning-heavy letters removed from the paragraph or page and replaced with gallows characters. For this, a gallows character isn’t a letter itself but is instead a token that means “transpose the (next?) removed letter back from the appropriate Neal Key”.

    http://www.nickpelling.com/

  9. avatar Marcelo Montemurro June 28, 2013 12:51 pm

    Dear Nick,

    These are interesting ideas and probably can be tested somehow. The Voynich manuscript problem will not be closed any time soon, that’s quite clear.

    One thing that we stress in the paper is that presence of Zipf’s law needs to be considered in any model or explanation of the Voynich manuscript. Apart of its presence in language, Zipf’s law emerges in a number of phenomena underpinned by certain classes of stochastic processes, for instance: city-size distributions, internet links, surname distribution, etc. The attempts to explain specifically how Zipf’s distribution emerges in those phenomena led to insights into the underlying dynamics driving them. Thus, the presence of Zipf’s law cannot be overlooked in any theory or serious speculation on the manuscript’s origins, since trying to understand why or how it’s there can help to narrow down the possibilities further.

    Any non-trivial cipher applied to natural language text will destroy the frequency profile of the resulting ciphertext words and severely affect Zipf’s distribution. For instance, a simple poly-alphabetical cipher (this type of cipher may be a bit too early to have been used in the Voynich, but it’s useful as a proof of concept) will typically map the same n-gram of plaintext into different n-grams of ciphertext, which will lead to a flattening of the word frequency profile.

    Thus, how is it that Zipf’s law is there? if we rule out the simplest hypothesis -that it is there because of the original linguistic structure- then we necessarily need to address the issue of how it emerged, since it is not any trivial process that leads to Zipf’s law.

    Then, on top of that we have all the other linguistically consistent statistical features of the text, which either we like it or not, are there and need to be taken into account together with the presence of Zipf’s law.

    Among other quantitative facts in the paper, the ‘semantic’ networks that we obtained are another ingredient in the puzzle. These structures are highly significant and neatly link words with similar morphology. These words tend to co-occur in exactly the same way as one would expect for semantically related words. Again, most useful ciphers will destroy any sort of order of this kind. Interestingly, and more on the speculative side, the structure of the Voynichese’s ‘semantic’ networks is not incompatible with Friedman’s final verdict on the Voynich manuscript that it was a very early attempt to create an artificial or universal language of the a-priori type. As was the case of John Wilkins’ ‘philosophical’ language in 17th century, in these schemes semantically related words can have similar forms.

    Now, leaving speculations aside, the collected statistical evidence of structure at different levels needs to be part of any theory for the text’s origins. In its modest role, it can be used as falsification tool: if a model or theory proves inconsistent with any of the empirical statistical facts, then it has to be abandoned.

    All the best,

    Marcelo

  10. avatar nickpelling June 28, 2013 1:38 pm

    Marcelo: thanks for responding so quickly! :-)

    I’m very wary of using Zipf’s Law as a starting point, precisely because it has been shown to be a product both of natural and artificial processes. I suspect we need something with a little bit more discriminatory power. ;-)

    Statistics-flattening polyalphabetic ciphers were first introduced in 1467 by Alberti: with the high amount of low-level structuring in the Voynich, we are clearly not looking at something like that. But if not that, then what? That’s the mystery (or perhaps the paradox)!

    For me, I think the historical conclusion is that Voynichese can only be a neat assembly of small tricks, well arranged. But we are defeated by its overall architecture, not by any single one of its component mechanisms.

    I shall try to think of interesting follow-up tests for you, perhaps together we can find something that will prove even more revealing…

    http://www.nickpelling.com/

  11. avatar Diane June 29, 2013 6:03 am

    May I ask a question that has bothered me for a while – sorry if it’s off the point.

    How would the stats look if you took a piece of music written for four or five voices (say) and using the normal five bars, wrote the notes using letters, and in one horizontal line, breaking the ‘words’ where the bar-lines would be.

    For the sake of the exercise, suppose all notes of equal length, to start with.

    Would that create a similar pattern of letter rigidity – confined by the rules of harmony, but ‘word’ variability?

    I could phrase the same question in terms of many systems of element-and-pattern, but music is nicely gender neutral.

  12. avatar Joachim Dathe June 29, 2013 7:49 am

    Diane: Any data that can be converted to a text code (e.g.Ansi) can be evaluated by (my) N-gram tool.
    Word- and phrase structure is created automatically, so you should avoid such a requirement (bar line) initially.
    Joachim.

    http://voynich2arabic.wordpress.com

  13. avatar Job June 29, 2013 8:32 am

    From M&Z’s paper:

    Taking into account that the clusters were built on the basis of word co-occurrence, and assuming that co-occurrence reveals semantic affinity, we conclude that a strong connection between form and meaning characterizes the most informative words of “Voynichese”.

    How would this apparent association between form and meaning extend to similar variants outside of the top-30 list? For example, if we establish that “chedy” and “shedy” have some semantic affinity, what can we infer about the meaning of “chody”?

    This is something that i’ve been wondering about. An analysis of single-character variants in the VMS indicates that, well, there are lots of them.

    The top-30 word “chedy” has 49 distinct single-character variants in the VMS. The table below lists these variants. The second column lists the number of single-character variants that, in turn, each such variant has. The third column indicates a similar quantity, but excludes values that are also single-character variants of “chedy”.

    For example, “schedy” itself has 22 single-character variants, 10 of which are not single-character variants of “chedy”.

    schedy 22 10
    chey 54 44
    cphedy 9 7
    dchedy 26 15
    chady 12 5
    chody 42 35
    lchedy 29 18
    chedyl 6 4
    cthedy 10 8
    chefy 13 5
    csedy 7 7
    chkedy 10 7
    chedyr 5 4
    chdedy 7 3
    ched 33 28
    chydy 14 8
    cheody 38 33
    qchedy 16 5
    chedey 12 10
    chery 15 7
    cheey 55 45
    ychedy 27 16
    cheds 15 11
    rchedy 19 8
    chedo 13 9
    shedy 29 28
    chedl 16 11
    chesy 18 10
    cheky 27 18
    chddy 12 5
    echedy 16 5
    fchedy 23 12
    cheoy 30 21
    pchedy 28 17
    tchedy 27 16
    cheady 11 7
    chsdy 14 8
    chtedy 10 7
    chdy 33 26
    chedky 6 4
    chety 22 13
    cheda 13 9
    chetdy 9 5
    cheedy 27 20
    ckhedy 11 9
    chepy 15 7
    chpdy 9 3
    ochedy 34 23
    kchedy 25 14

    This noticeable presence of word variance yields quite a dense graph, when we plot the VMS’s single-character variants:
    http://voynichms.appspot.com/images/vm-word-graph.png

    As i understand it, in M&Z’s paper the implied association between form and meaning applies only to the words in the top-30 list – otherwise the whole text might well be some form of early thesaurus. My question then is, what do we make of all the word variance in the VMS?

    If we exclude typos and transcription errors as the cause – it’s unlikely that it could account for the numbers we see – then we have two possibilities:
    1. Word variability corresponds to similar variability in the underlying plain text.
    2. Word variability is a property of the cipher text alone.

    Without some understanding of the contents of the text, it’s difficult to rule out the first option. However it seems implausible that the clear text could have the same level of word variability – there’s just too much of it.

    The second option interests me the most. If we imagine that “chedy” and”chepy” encode completely different words, such as “acorn” and “linen” (however semantically related they might be), then the unique data among the two words (e.g. “p” in this case, excluding context) does not seem sufficient to encode the plain text without either data loss or the use of an additional data source, such as a code book or other context (e.g. the illustrations, surrounding words, etc) – certainly not when we account for all the variants.

    I believe this topic is worthy of a close analysis, it may be possible to show that the VMS is the result of a context-aware, or “extra-stateful” cipher.

  14. avatar nickpelling June 29, 2013 8:57 am

    Job: thank you for your detailed and thoughtful comment. What you are describing is precisely the “wild word variability” I brought up in the post.

    I should perhaps add that what I call the paradox about tightly-coupled letters and wildly-variable words coexisting is only really a paradox if you assume that Voynichese is an untransformed language (your case #1).

    If you have an additional variation-adding encipherment step (your case #2, that I usually call “confounding”) in combination with a verbose cipher back-end, then I think the paradox disappears.

    However, the problem is that the first well-known confounding mechanism is Alberti’s cipher wheel, and even that seems to have arrived fractionally too late for the Voynich.

    All the same, what Alberti saw his cipher wheel as replacing was complicated “strewing” (in-page transposition hacks). Even if we don’t currently have any documented examples of these, Alberti was clearly a witness to their existence.

    So for me, (1) the variability stats show that some mechanism is confounding the language patterns in the text; (2) the dating and the stats rule out Alberti’s cipher wheel; (3) just about the only hack left standing is in-page transposition.

    Of course, proving all this is another matter entirely… :-)

    http://www.nickpelling.com/

  15. avatar Diane June 29, 2013 11:01 am

    Joachim.
    I’m told that the result would be, in effect, a transposition cipher which might present as a verbose cipher.

    Compressing items originally disposed over five lines is no more than the reverse of one method for transposition, it seems, and the effect of fitting five lines (or so) into one would or could present as verbose.

    Hope I’ve got that right.
    It was a general question; I’ve no intention of trying to tackle Voynichese. :D

    Diane

  16. avatar Novil June 29, 2013 11:31 am

    Nick, I believe one should not look at the time when certain ciphers have been published the first time when trying to break the Voynich. If it’s not something that definitely requires a computer, I think one should assume that the creator of the Voynich had that idea already, even if it’s 200 years “too early”. Inventing a cipher isn’t too hard compared to other fields of science. Basically, you just have to look at a piece of text and ask yourself: “How can I shuffle these characters/words around?”

    It’s my strong believe that a stateful cipher is an essential part of the Voynich’s encryption.

    http://www.sandraandwoo.com/

  17. avatar nickpelling June 29, 2013 12:12 pm

    Novil: I’m delighted that this post is bringing out so many fans of stateful ciphers… but the question remains (a) what that state controls or alters (and how), and (b) what controls or alters the current state (and how).

    In that respect, even relaxing the historical constraints by two or even three centuries may not help much, because pretty much the only stateful ciphers that emerged for an very long time were Alberti and Vigenere (broadly speaking), and the Voynich’s polyalphabeticity doesn’t seem to match either of these even slightly.

    It is a curious thing (that I mentioned in “Curse”, but probably not as loudly as I should) that in his book Alberti appears to relate a debate he had had with a proponent of transposition ciphers. So even though he bears witness in 1467 to people having previously used ridiculously clever in-page transposition systems, we appear not to have a single example of such a system being used for real.

    Unfortunately, it seems that Alberti’s small book and Cicco Simonetta’s notes are all the 15th century have to offer us on ciphers. But perhaps Alberti tells us all we need to know!

    http://www.nickpelling.com/

  18. avatar Knox June 29, 2013 8:56 pm

    No one would argue that finding a description of the VMs cipher (if it is a cipher) in Alberti’s works, or in documents by anyone else, would be helpful. Until then Novil’s down-to-earth statements can be incorporated in an attempt to explore the text. Additionally, we are not locked into the 15th century. Whatever probabilities we individually assign to the date of the creation of the VMs, they are only a guide to one’s personal research.

  19. avatar Menno Knul June 29, 2013 11:33 pm

    Dear Job,

    I wrote that words ending in -edy show a common feature, that they do not occur in f1-f25, which almost is 1/2 to 1/3rd of the herbal section. The normal distribution with such frequent ‘words’ is that they would stretch out over the whole text of the MS. This pecularity can mean a change of the language, a change of the code or a change of the contents of the texts.
    Secondly there seems to be a regularity in the ‘endings’ -aiiin, -aiin, -ain and -am and -aiiir, -aiir, -air and -ar, similarly in words with -eee-, -ee-, -e- or -ooo-, -oo-, -o- or ooiin, oin, om. The question here is, if we should not eliminate the double ‘vowels’ for study purposes as they tend to indicate semantic cohesion.

    Menno Knul

  20. avatar bdid1dr June 30, 2013 4:30 pm

    Nick and friends: After reading many many posts regarding the “4″ character, I feel “com-pelled” to reiterate my reading or translation of, and use of it in the Voynich:

    It is consistently used throughout the document, and is the sound of “qu”. What seems to be the confounding factor is that any vowel can precede that character; a-e-i-o-and maybe u (but seldom). aqua, equal, enquire …. and sometimes a vowel-consonant combination to form “osquash” (the vegetable). The sibilant which looks like the hand-sickle you see in many medieval grain-mowing paintings is “S” as in esquire, squire, squirrel…..squash..

    One can only tell the difference between the sibilant “s” and the “rolled R” figure by comparing the straight handle on the “S” with the curved handle on the “R”. So, a very good example of the those two alphabet letters used in the same words would be “square” “squire” “esquire” “require”…..

  21. avatar bdid1dr June 30, 2013 4:41 pm

    So, people, what kind of frequency readings would you get if you “run the numbers” on my alpha-substitutions, at least in the botanical and balnealogical pages of the Voynich?

  22. avatar Job July 1, 2013 6:26 am

    Menno, thanks for bringing that to my attention.

    I can confirm that the suffix “edy” does not occur in quite a few folios – most notably in f1 through f25, but in other folios as well.

    Here’s a data plot indicating the occurrence of the suffix “edy” across all folios – the percentage value is relative to the number of words in the folio:
    http://voynichms.appspot.com/images/suffix-occurrence-by-folio-edy.png

    The absence of “edy” seems to result from the fact that the characters “e” and “d” very rarely occur in sequence in folios f1 through f25. Here’s a plot of the occurrence of “ed”:
    http://voynichms.appspot.com/images/fragment-occurrence-by-folio-ed.png

    Overall, usage of the character “e” seems to increase in later folios, starting out fairly small:
    http://voynichms.appspot.com/images/letter-occurrence-by-folio-e.png

    Also interesting is that “e” (the second most frequently occurring letter in the text) does not occur at all in f36r (62 words):
    http://brbl-zoom.library.yale.edu/viewer/1006144

    Finally, relating to your second suggestion, i transformed the contents of the VMS to collapse consecutive occurrences of the same character (e.g. “ii” and “iii” are replaced with “i”).
    In the resulting text, word variability is not really impacted (it’s actually slightly higher overall).

    While character repetition such as “i”, “ii” and “iii” contribute to the number of single-character variants, this does not seem to be the underlying cause.

  23. avatar Menno Knul July 1, 2013 12:47 pm

    Job, you made very interesting graphs, which clearly show the absence of the -edy ‘affix’ in the first 50 pages, which is not the case for the -ody affix.

    With regard to the aiii-aii-ai I went a step further to link this to the -am, -ar affix. I got the idea that there may be some misunderstanding between -ain and -am, which look very similar in the written text. I could not yet decide to read -ain as -am or to read -am as -ain, similarly -air as ar and -ar as -air. I’d like to know your opinion about that.
    What you noticed about the absence of -e- on f36r is interesting indeed. As it pertains to one single page only it could be by accident.

    I would like to make a comment on the special signs, which are related to the alchemistic signs and not to alphabetical or numerical signs. I can hardly believe that most paragraphs by accident start with words which begin with P or T unless we got here the name of a plant like in other herbaria many names start with the word Herba. This could mean, that the initial P and T should be interpreted as ‘markers” and do not belong to the text itself. I hope you understand me.

  24. avatar Knox July 1, 2013 8:06 pm

    EVA-edy is dense in B language sections. EVA-eody is the same with a big exception; it is rare in Quire 13 but frequent in some A language sections.

    I have observations about EVA-m and paragraph-initial words that might be helpful.

    http://notakrian.pbworks.com/w/page/edy%20and%20eody

  25. avatar Job July 2, 2013 9:33 am

    Meno,

    To my untrained eye, “ain” and “am” look similar, if not identical though i have really only glanced at the original text. I would hope that the available transcriptions are self consistent, even if they introduce extra characters.

    Relating to your comment regarding the use of characters “p” and “t” at the start of paragraphs, it looks like 32.7% of all paragraphs with two or more words start with “p”. The character “o” is in second with 19.4%, followed by “t” with 17.4% (“k” is next with 8).

    In my sample Dante text, “c” is the most popular letter at the start of a paragraph, with 16.5%, followed by “e” with 13.6%.

    In my sample Pliny text, “a” is the most popular with 12.4%, followed by “c” with 9%.

    Clearly the VMS stands out in this aspect. Here are some plots of the occurrence of “p”, “o” and “t” at the start of paragraphs, across all folios:
    http://voynichms.appspot.com/images/letter-occurrence-paragraph-start-by-folio-p.png
    http://voynichms.appspot.com/images/letter-occurrence-paragraph-start-by-folio-o.png
    http://voynichms.appspot.com/images/letter-occurrence-paragraph-start-by-folio-t.png

    On a separate topic, i’ve scanned the VMS in an attempt to find common word sequences. Since there aren’t that many, of significant size, i ended up searching for common word sequences containing words that are similar.

    Here are the similar sequences consisting of six words:

    f107v Par 15 Word 12
    [aiin, chey, qol, aiin, al, chedy]
    f77v Par 10 Word 27
    [aiiin, chedy, qol, daiin, sal, chedy]

    f81v Par 2 Word 76
    [ol, cheky, ol, shedy, qokedy, qokedy]
    f75v Par 18 Word 3
    [or, chesy, sol, shey, qokeey, qotedy]

    f75r Par 1:40
    [r, ain, ol, ol, sheedy, qokeey]
    f103r Par 17 Word 11
    [lr, ain, l, ol, sheed, qokeey]

    I guess there aren’t that many of these either.

  26. avatar Joachim Dathe July 2, 2013 10:33 am

    @ Job

    A lot of things with which you torture yourself, is already available, see here:
    http://archive.org/details/Voynich_Manuscript_Lexemes_List

    http://archive.org/details/eva27sim

    Joachim.

    http://voynich2arabic.wordpress.com

  27. avatar Job July 2, 2013 9:10 pm

    Joachim,

    That’s actually the type of raw data dump i’d rather avoid, especially in PDF format which is not easily processed – it
    would be far easier for me to write a Java query that yields the same information.

    I hope you don’t think i’m scanning the VMS manually – that would be torturous.

  28. avatar Joachim Dathe July 2, 2013 11:22 pm

    Job,

    Maybe you didn’t realize the different formats available there, including “full text” which could be processed easily.

    BTW. No one, so to speak, processing the EVA-Alphabet as a derivative of the original code, building such way an additional
    encryption layer, has any chance to make progress in deciphering.

    Joachim.

    http://voynich2arabic.wordpress.com

  29. avatar Menno Knul July 3, 2013 4:18 pm

    My latest comment on Joachim Dathe got lost. The idea was, that the signs P and T as first letter of the first word of paragraphs should be interpreted as markers to indicate a category like Herba or Erba.

    Menno

  30. avatar bdid1dr July 5, 2013 1:43 am

    Menno: The elaborate “P” which is seen at the beginning of any Voynich botanical discussion is the entire two-letter combination of S-p for the entire word “Specie” — that is for the botanical discussions.

    When you see that elaborate “P” being stretched and/or followed by vowels you are looking at latin words for “prescription”, “proscribe”, “portion”. I re-iterate one example I’ve discussed before: The word prescription can be condensed into three letters-in-one — which can be found on just about any doctor’s prescription pad and/or drugstore symbols: If it looks like a capital “R” with a slash mark across the protruding leg, you are looking at the contracted word beginning of presc-rip-tion.

  31. avatar T Anderson July 6, 2013 1:57 pm

    Nick, i was going to email you, but this box is so convenient.

    My general thoughts on the the VM align with Nick, I’m not sure if this is due to a background in computer programming and history or not.

    My opinion, first off, is that we have to first look in all earnest at the period right after the carbon dates give for production of the skins. Next we have to come to terms with what our creator could and could not do during this era. (assuming the VM isn’t a copy)

    I’m going to narrow in here and say it’s likely Italian, and that the person who made it is possibly person Alberti mentions, and probably knows http://it.wikipedia.org/wiki/Giovanni_Fontana_(scienziato) and his speculum http://upload.wikimedia.org/wikipedia/it/6/6b/Giovanni_fontana_speculum.jpg which i’m sure is where Alberti got the idea from.

    I don’t think Fontana was concerned enough with the natural world to be our author, but i haven’t researched him enough to say for sure. What i can say is he came up with the right ideas that someone could have used to create the VM( he created the speculum and wrote using an invented system). He apparently came up with many so called memory devices other than the speculum but i haven’t looked into them yet.

  32. avatar nickpelling July 6, 2013 2:23 pm

    T A: overlook Fontana at your peril! The only reason I didn’t consider him more when I wrote “Curse” was that I believed (wrongly) that he was long dead by 1450 (whereas in fact he died after 1454). Moreover:

    * He was a doctor
    * He was the only other Renaissance author of a book-length cipher we know about
    * He wrote an encyclopaedia
    * His rotating memory devices bear more than a passing resemblance to Alberti’s cipher disk.
    * The Italian Wikipedia article on him talks about his experiments in natural science etc

    Even though I ended up pinning my flag to Filarete’s mast back in 2006, I should say that Fontana is my #2 candidate (and I don’t really have a #3 I’m particularly happy with).

    If you’re looking for someone who knows everything Fontana knows, why not Fontana himself? ;-)

    http://www.nickpelling.com/

  33. avatar Job July 6, 2013 8:25 pm

    Fontana seems like a really good candidate.

    The use of a mnemonic device is plausible and could account for many of the properties of the VMS, such as:
    1. Evidence of word-centric encoding.
    2. Evidence of structure within words.
    3. Evidence of patterns generally associated with natural languages (resulting from 1 above).
    4. Possibility of extra state having been used in the encoding process, suggested by the number of word variants.

    A mnemonic device could act as a sort of dynamic codebook.

    Moreover, his association with witchcraft and varied interests are also compatible with the manuscript’s contents – in particular the suggestion that, quoting from a roughly translated Wikipedia article:

    His works can be deduced or speculate travel to Bologna, Ravenna, Rome , but also in distant countries (some perhaps in his imagination).

    If the author was associated with the practice of witchcraft, then it’s possible that the unrecognized illustrations of plants would have been compositions assembled from various elements, not so much to depict known species but to formulate the existence and possible usage of new types of plants.

    For example, if root X and leaf Y of two different species are understood to have certain effects, then what would a single plant combining both X and Y be used for?

    Perhaps the author was simply depicting and describing the properties of fantastic plants “to look for”.

  34. avatar nickpelling July 6, 2013 8:41 pm

    Job: the idea that Fontana may have used his circular mnemonic device to generate some set of Voynich words seems at first interesting. But this is stateless, and would surely preserve the semantic structure of sentences.

    All the same, I do like Fontana – he’s #2 for me for a whole load of good reasons! :-)

    http://www.nickpelling.com/

  35. avatar xplor July 6, 2013 11:49 pm

    For # 3
    You should have considered Francesco (Cicco) Simonetta, 1410-1480. Then you could have read

    Rules for Decrypting Enciphered Documents Without a Key,
    Then you would know where LHR is.

  36. avatar T Anderson July 7, 2013 2:54 am

    I’ve known about the VM for years, and looked through it, browsed books etc here and there without ever participating in the conversation, so thanks for the warm reception.

    This is the reason I’m not saying it has to be Fontana. Although I think he either designed or inspired the cryptography if the italian wikipedia articles mention that he designed many memory machines is correct then i wonder what survives of his work?

    The “speculum” on its own would be stateless, but you could imagine an easy jump to statefulness.

    One thing i haven’t seen discussed enough when talking about the VM is that the people we suggest as possible authors are often people who due to their vocations would have greater artistic ability than we see in the VM. Fontana, Alberti, Averlino were all architects. I do not see the hand of a 15th century architect in the VM castle.

    Given this, could the VM be the work of an apprentice? Or a copy by an apprentice? Going off on a tangent here, I still that Fontana is the right person at the right time to create the VM from a technical point of view. It’s also of note that he is the only one to publish books in code during the 15th century, although i think the titles and such were latin.

  37. avatar Diane O’Donovan September 25, 2013 12:55 pm

    “A manuscript is something between a gadget and a personality”.

    - Robert Pierce Casey

    had to share that one :)

    cited from recent post in

    hmmlorientaliadotwordpressdotcom

Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>