26Jun 2013

Montemurro Voynich paper and the “genuine message”…

The news rattling the bars of the Voynich research cage loudest right now is surely the publication of a paper by Marcelo Montemurro and Damián H. Zanette called Keywords and Co-Occurrence Patterns in the Voynich Manuscript: An Information-Theoretic Analysis, deftly summarized in New Scientist as New signs of language surface in mystery Voynich text.

M&Z’s abstract brings out a lot of what they were trying to do – and also points exactly to their mistakes.

“Here we analyse the long-range structure of the manuscript using methods from information theory. We show that the Voynich manuscript presents a complex organization in the distribution of words that is compatible with those found in real language sequences. We are also able to extract some of the most significant semantic word-networks in the text. These results together with some previously known statistical features of the Voynich manuscript, give support to the presence of a genuine message inside the book.“

Central Assumption: the authors implicitly hypothesize that they can get meaningful results for long-range comparisons because Voynichese is homogeneous across all its sections.

…The Problem: this assumption is false (or very nearly so), because there are significant macro-level differences in the way the language in different sections works (Currier A, Currier B, labels) as well as many mid-level differences (Herbal-A, Q13-ese, etc).

Central Conclusion: the authors believe that their language-centric statistical machinery has identified “The thirty most informative words in the Voynich manuscript”.

…The Problem: I’m pretty sure that the authors have in fact very probably identified arguably the thirty least informative words in the Voynich Manuscript. (That may be an independently useful result, but it’s probably not really what they were hoping for.)

I’ll explain.

Voynichese is extremely predictable at a letter-level: it has many rigid letter-level adjacency rules (‘4’ is almost always followed by ‘o’, etc) and position rules (4o- is consistently word-initial, -89 is consistently word-final, etc) and a high level of letter-context predictability.

Yet at the same time, it also has a very large dictionary relative to its text size. I often criticize Gordon Rugg for suggesting historically incorrect Cardan grille-like tables (i.e. they’re a century too late for the Voynich’s construction dating) and for inappropriately back-projecting his modern CompSci mindset onto the early Renaissance (i.e. it’s 500+ years too early for the kind of table-driven hackery he proposes). However, he is absolutely right that a reconstructed Voynichese “dictionary” would, to a modern computer scientist’s eyes, look very much as if it had been generated or permuted by some means.

The paradox is therefore that these two apparently opposite aspects of Voynichese are able to coexist: how on earth can we reconcile its letter rigidity & predictability with its wild word variability?

I think the key to resolving this is to grasp that there is some kind of generative or confounding principle at work within a rigidly predictable framework. That is, that even though there are lots of rules, these rules act as a kind of “container” for semantic or cryptographic variability to exist within.

Hence I believe that Montemurro’s statistical machinery is identifying “words” that fall within the container layer rather than in the confounded content layer. Hence these are arguably the thirty least informative words in the Voynich Manuscript.

It’s a hard point to understand, let alone accept: the confounding trick (some kind of transposition cipher? some kind of paper cipher machinery? some kind of cipher wheel?) driving Voynichese’s inherent variability remains as profoundly unreachable now as it has been for over 500 years.

My apologies to Montemurro and Zanette, but the central challenge we face isn’t to find new language-based statistical tests to apply to the Voynichese corpus, however clever they may be. Rather, it is to find ways of resolving the Voynich Manuscript’s central paradox: how is it that Voynichese is both letter-rigid and word-variable at the same time?

Incidentally, M & Z conclude in their paper that results point to a semantic link between the Recipe and Astro sections, and between the Herbal and Pharma sections. Actually, had they been more aware of the codicology analyses that have been done, they would have seen that their results are consistent with the writing phase order.

In fact, there are many indications that what I call Voynichese’s ‘container’ layer above evolved during the writing, with the most obvious evolution being between Currier A and Currier B. I suspect that what their statistical machinery has imperfectly captured is therefore simply a snapshot into the evolution of the container layer, and not anything ‘semantic’ as such.

In short, the aspect of Voynichese that is most nearly homogeneous across all its sections is its “container” layer: so what Montemurro and Zanette have done is make long-range comparisons between evolutions of the container layer. Currently, my best guess is that these are likely to be almost entirely composed of cipher system meta-tokens (shorthand tokens, transposition cipher placeholders, etc) rather than the semantic contents, which appear instead to have been confounded by some means.

So, rather than finding a “genuine message” (as New Scientist put it), perhaps they have instead found a “genuine container” for the message? This may prove to be a very useful result in its own right, but it’s probably not the smoking gun linguistic proof they were hoping to use to discredit Rugg’s tables.

Posted in: Voynich Manuscript

42 thoughts on “Montemurro Voynich paper and the “genuine message”…”

Warlock Asylum on June 26, 2013 at 4:59 pm said:

Very informative article. I looked into the manuscript several years ago, but my time was consumed by trying to understand the Vasuh language, which is a numerical language. It may be the same for the Voynich manuscipt.

Stay Blessed
Knox on June 27, 2013 at 2:30 pm said:

Nick, you are banging all around the “paradox”. Let’s get rid of it in one sentence. Word variability in Voynichese compared to known text is achieved by more relaxed rules for nGram sequence within a word and by the short edit distances of the words. The latter is partially affected by the former. There may be additional variability caused by scribal errors.
Pingback: Voynich-Manuskript: Wo die Forschung ansetzen muss (Teil 3) – Klausis Krypto Kolumne
nickpelling on June 27, 2013 at 6:31 pm said:

Knox: …but that doesn’t resolve the paradox.

Unless something is actively confounding the word generation process, we would expect to see exactly the kind of long-range word matches that Montemurro and Zanatte see, only far far stronger. So the paradox remains: the ngram stats are merely symptoms of whatever the underlying process is, not the process itself.
Knox on June 27, 2013 at 9:01 pm said:

OK. I think I get the point. Now bear with me while I convince myself by analogy that there is no meaning in the VMs.
There was a book of 3-digit numbers. On the chance that The Book might have meaning, curious people had a closer look. Someone discovered that the hundreds digits were all odd, the tens digits were all even, and the units digits were both even and odd. Another person wrote single digit numbers on scraps of paper. He placed a number of paper scraps with odd digits in a box (Box 1), papers with even digits in Box 2, and papers with both even and odd digits in Box 3. The pieces of paper are drawn one at a time from the boxes in turn.
What does the drawing tell us about the original numbers?
Hmmmmm … ummm… hmf..
Help me with this, will you?
nickpelling on June 27, 2013 at 9:57 pm said:

Knox: that’s more to do with distribution stats. What gives Voynichese its low entropy is its letter-to-letter adjacency rules. But these rules are then finessed by yet more section-specific rules, such as the way word-initial y+gallows appears disproportionately often (as I recall) in the label section.
Knox on June 28, 2013 at 1:24 am said:

Yes (for h2). But look again; I was picking on Gordon Rugg in the second message. Have to give him credit for a lot of thought and work and he was looking for the same thing we all are. I should say all _were_, until some found _it_. I regret the word “hoax” in the name of his theory. We cannot know the intent of the creator and it is not necessary to what he proposes. “Hoax” will continue to be misused.

Interesting question by Klaus (the pingback). I’ll have to give my impression since I can’t read German. Can Rugg-style generated text show the several characteristics of language found in the VMs, especially by someone who didn’t know those characteristics existed? Back to me: That question is valid for ciphers other than simple substitution and transposition. I think we should make a list of the language-like properties and try to answer the question.

A word transposition that was done in the same manner as letters in a railfence cipher would retain many language characteristics and negate others. Now we have to explain the abnormal edit distance of the lexicon as recently described by Job. It’s substitution. What is the nature of the substitution?
Diane on June 28, 2013 at 5:18 am said:

Nick – I might mention that when I linked to Klaus’ site through the pingback on your page it was ok,
but when I tried to go directly, I received a MAJOR alert from my browser, saying it is an ‘attack’ site.

Klaus’ is a blog I’ve visited fairly often, so this is a new development.
nickpelling on June 28, 2013 at 11:31 am said:

Knox: there’s only evidence of rail-fence transposition from the mid-16th century onwards. So that’s probably not what we are looking at here.

When people talk about transposition ciphers in the context of the fifteenth century, it’s normally either syllable reversing (as in the “Da Vinci Cipher” chapter of my book) or “strewing” (as alluded to by Alberti in his little book, that I also describe in my book), which means moving letters around the page.

The example I normally give is of “Neal Keys”: one working hypothesis posits these as blocks of meaning-heavy letters removed from the paragraph or page and replaced with gallows characters. For this, a gallows character isn’t a letter itself but is instead a token that means “transpose the (next?) removed letter back from the appropriate Neal Key”.
Marcelo Montemurro on June 28, 2013 at 12:51 pm said:

Dear Nick,

These are interesting ideas and probably can be tested somehow. The Voynich manuscript problem will not be closed any time soon, that’s quite clear.

One thing that we stress in the paper is that presence of Zipf’s law needs to be considered in any model or explanation of the Voynich manuscript. Apart of its presence in language, Zipf’s law emerges in a number of phenomena underpinned by certain classes of stochastic processes, for instance: city-size distributions, internet links, surname distribution, etc. The attempts to explain specifically how Zipf’s distribution emerges in those phenomena led to insights into the underlying dynamics driving them. Thus, the presence of Zipf’s law cannot be overlooked in any theory or serious speculation on the manuscript’s origins, since trying to understand why or how it’s there can help to narrow down the possibilities further.

Any non-trivial cipher applied to natural language text will destroy the frequency profile of the resulting ciphertext words and severely affect Zipf’s distribution. For instance, a simple poly-alphabetical cipher (this type of cipher may be a bit too early to have been used in the Voynich, but it’s useful as a proof of concept) will typically map the same n-gram of plaintext into different n-grams of ciphertext, which will lead to a flattening of the word frequency profile.

Thus, how is it that Zipf’s law is there? if we rule out the simplest hypothesis -that it is there because of the original linguistic structure- then we necessarily need to address the issue of how it emerged, since it is not any trivial process that leads to Zipf’s law.

Then, on top of that we have all the other linguistically consistent statistical features of the text, which either we like it or not, are there and need to be taken into account together with the presence of Zipf’s law.

Among other quantitative facts in the paper, the ‘semantic’ networks that we obtained are another ingredient in the puzzle. These structures are highly significant and neatly link words with similar morphology. These words tend to co-occur in exactly the same way as one would expect for semantically related words. Again, most useful ciphers will destroy any sort of order of this kind. Interestingly, and more on the speculative side, the structure of the Voynichese’s ‘semantic’ networks is not incompatible with Friedman’s final verdict on the Voynich manuscript that it was a very early attempt to create an artificial or universal language of the a-priori type. As was the case of John Wilkins’ ‘philosophical’ language in 17th century, in these schemes semantically related words can have similar forms.

Now, leaving speculations aside, the collected statistical evidence of structure at different levels needs to be part of any theory for the text’s origins. In its modest role, it can be used as falsification tool: if a model or theory proves inconsistent with any of the empirical statistical facts, then it has to be abandoned.

All the best,

Marcelo
nickpelling on June 28, 2013 at 1:38 pm said:

Marcelo: thanks for responding so quickly! 🙂

I’m very wary of using Zipf’s Law as a starting point, precisely because it has been shown to be a product both of natural and artificial processes. I suspect we need something with a little bit more discriminatory power. 😉

Statistics-flattening polyalphabetic ciphers were first introduced in 1467 by Alberti: with the high amount of low-level structuring in the Voynich, we are clearly not looking at something like that. But if not that, then what? That’s the mystery (or perhaps the paradox)!

For me, I think the historical conclusion is that Voynichese can only be a neat assembly of small tricks, well arranged. But we are defeated by its overall architecture, not by any single one of its component mechanisms.

I shall try to think of interesting follow-up tests for you, perhaps together we can find something that will prove even more revealing…
Diane on June 29, 2013 at 6:03 am said:

May I ask a question that has bothered me for a while – sorry if it’s off the point.

How would the stats look if you took a piece of music written for four or five voices (say) and using the normal five bars, wrote the notes using letters, and in one horizontal line, breaking the ‘words’ where the bar-lines would be.

For the sake of the exercise, suppose all notes of equal length, to start with.

Would that create a similar pattern of letter rigidity – confined by the rules of harmony, but ‘word’ variability?

I could phrase the same question in terms of many systems of element-and-pattern, but music is nicely gender neutral.
Joachim Dathe on June 29, 2013 at 7:49 am said:

Diane: Any data that can be converted to a text code (e.g.Ansi) can be evaluated by (my) N-gram tool.
Word- and phrase structure is created automatically, so you should avoid such a requirement (bar line) initially.
Joachim.
Job on June 29, 2013 at 8:32 am said:

From M&Z’s paper:

Taking into account that the clusters were built on the basis of word co-occurrence, and assuming that co-occurrence reveals semantic affinity, we conclude that a strong connection between form and meaning characterizes the most informative words of “Voynichese”.

How would this apparent association between form and meaning extend to similar variants outside of the top-30 list? For example, if we establish that “chedy” and “shedy” have some semantic affinity, what can we infer about the meaning of “chody”?

This is something that i’ve been wondering about. An analysis of single-character variants in the VMS indicates that, well, there are lots of them.

The top-30 word “chedy” has 49 distinct single-character variants in the VMS. The table below lists these variants. The second column lists the number of single-character variants that, in turn, each such variant has. The third column indicates a similar quantity, but excludes values that are also single-character variants of “chedy”.

For example, “schedy” itself has 22 single-character variants, 10 of which are not single-character variants of “chedy”.

schedy 22 10
chey 54 44
cphedy 9 7
dchedy 26 15
chady 12 5
chody 42 35
lchedy 29 18
chedyl 6 4
cthedy 10 8
chefy 13 5
csedy 7 7
chkedy 10 7
chedyr 5 4
chdedy 7 3
ched 33 28
chydy 14 8
cheody 38 33
qchedy 16 5
chedey 12 10
chery 15 7
cheey 55 45
ychedy 27 16
cheds 15 11
rchedy 19 8
chedo 13 9
shedy 29 28
chedl 16 11
chesy 18 10
cheky 27 18
chddy 12 5
echedy 16 5
fchedy 23 12
cheoy 30 21
pchedy 28 17
tchedy 27 16
cheady 11 7
chsdy 14 8
chtedy 10 7
chdy 33 26
chedky 6 4
chety 22 13
cheda 13 9
chetdy 9 5
cheedy 27 20
ckhedy 11 9
chepy 15 7
chpdy 9 3
ochedy 34 23
kchedy 25 14

This noticeable presence of word variance yields quite a dense graph, when we plot the VMS’s single-character variants:
http://voynichms.appspot.com/images/vm-word-graph.png

As i understand it, in M&Z’s paper the implied association between form and meaning applies only to the words in the top-30 list – otherwise the whole text might well be some form of early thesaurus. My question then is, what do we make of all the word variance in the VMS?

If we exclude typos and transcription errors as the cause – it’s unlikely that it could account for the numbers we see – then we have two possibilities:
1. Word variability corresponds to similar variability in the underlying plain text.
2. Word variability is a property of the cipher text alone.

Without some understanding of the contents of the text, it’s difficult to rule out the first option. However it seems implausible that the clear text could have the same level of word variability – there’s just too much of it.

The second option interests me the most. If we imagine that “chedy” and”chepy” encode completely different words, such as “acorn” and “linen” (however semantically related they might be), then the unique data among the two words (e.g. “p” in this case, excluding context) does not seem sufficient to encode the plain text without either data loss or the use of an additional data source, such as a code book or other context (e.g. the illustrations, surrounding words, etc) – certainly not when we account for all the variants.

I believe this topic is worthy of a close analysis, it may be possible to show that the VMS is the result of a context-aware, or “extra-stateful” cipher.
nickpelling on June 29, 2013 at 8:57 am said:

Job: thank you for your detailed and thoughtful comment. What you are describing is precisely the “wild word variability” I brought up in the post.

I should perhaps add that what I call the paradox about tightly-coupled letters and wildly-variable words coexisting is only really a paradox if you assume that Voynichese is an untransformed language (your case #1).

If you have an additional variation-adding encipherment step (your case #2, that I usually call “confounding”) in combination with a verbose cipher back-end, then I think the paradox disappears.

However, the problem is that the first well-known confounding mechanism is Alberti’s cipher wheel, and even that seems to have arrived fractionally too late for the Voynich.

All the same, what Alberti saw his cipher wheel as replacing was complicated “strewing” (in-page transposition hacks). Even if we don’t currently have any documented examples of these, Alberti was clearly a witness to their existence.

So for me, (1) the variability stats show that some mechanism is confounding the language patterns in the text; (2) the dating and the stats rule out Alberti’s cipher wheel; (3) just about the only hack left standing is in-page transposition.

Of course, proving all this is another matter entirely… 🙂
Diane on June 29, 2013 at 11:01 am said:

Joachim.
I’m told that the result would be, in effect, a transposition cipher which might present as a verbose cipher.

Compressing items originally disposed over five lines is no more than the reverse of one method for transposition, it seems, and the effect of fitting five lines (or so) into one would or could present as verbose.

Hope I’ve got that right.
It was a general question; I’ve no intention of trying to tackle Voynichese. 😀

Diane
Novil on June 29, 2013 at 11:31 am said:

Nick, I believe one should not look at the time when certain ciphers have been published the first time when trying to break the Voynich. If it’s not something that definitely requires a computer, I think one should assume that the creator of the Voynich had that idea already, even if it’s 200 years “too early”. Inventing a cipher isn’t too hard compared to other fields of science. Basically, you just have to look at a piece of text and ask yourself: “How can I shuffle these characters/words around?”

It’s my strong believe that a stateful cipher is an essential part of the Voynich’s encryption.
nickpelling on June 29, 2013 at 12:12 pm said:

Novil: I’m delighted that this post is bringing out so many fans of stateful ciphers… but the question remains (a) what that state controls or alters (and how), and (b) what controls or alters the current state (and how).

In that respect, even relaxing the historical constraints by two or even three centuries may not help much, because pretty much the only stateful ciphers that emerged for an very long time were Alberti and Vigenere (broadly speaking), and the Voynich’s polyalphabeticity doesn’t seem to match either of these even slightly.

It is a curious thing (that I mentioned in “Curse”, but probably not as loudly as I should) that in his book Alberti appears to relate a debate he had had with a proponent of transposition ciphers. So even though he bears witness in 1467 to people having previously used ridiculously clever in-page transposition systems, we appear not to have a single example of such a system being used for real.

Unfortunately, it seems that Alberti’s small book and Cicco Simonetta’s notes are all the 15th century have to offer us on ciphers. But perhaps Alberti tells us all we need to know!
Knox on June 29, 2013 at 8:56 pm said:

No one would argue that finding a description of the VMs cipher (if it is a cipher) in Alberti’s works, or in documents by anyone else, would be helpful. Until then Novil’s down-to-earth statements can be incorporated in an attempt to explore the text. Additionally, we are not locked into the 15th century. Whatever probabilities we individually assign to the date of the creation of the VMs, they are only a guide to one’s personal research.
Menno Knul on June 29, 2013 at 11:33 pm said:

Dear Job,

I wrote that words ending in -edy show a common feature, that they do not occur in f1-f25, which almost is 1/2 to 1/3rd of the herbal section. The normal distribution with such frequent ‘words’ is that they would stretch out over the whole text of the MS. This pecularity can mean a change of the language, a change of the code or a change of the contents of the texts.
Secondly there seems to be a regularity in the ‘endings’ -aiiin, -aiin, -ain and -am and -aiiir, -aiir, -air and -ar, similarly in words with -eee-, -ee-, -e- or -ooo-, -oo-, -o- or ooiin, oin, om. The question here is, if we should not eliminate the double ‘vowels’ for study purposes as they tend to indicate semantic cohesion.

Menno Knul
bdid1dr on June 30, 2013 at 4:30 pm said:

Nick and friends: After reading many many posts regarding the “4” character, I feel “com-pelled” to reiterate my reading or translation of, and use of it in the Voynich:

It is consistently used throughout the document, and is the sound of “qu”. What seems to be the confounding factor is that any vowel can precede that character; a-e-i-o-and maybe u (but seldom). aqua, equal, enquire …. and sometimes a vowel-consonant combination to form “osquash” (the vegetable). The sibilant which looks like the hand-sickle you see in many medieval grain-mowing paintings is “S” as in esquire, squire, squirrel…..squash..

One can only tell the difference between the sibilant “s” and the “rolled R” figure by comparing the straight handle on the “S” with the curved handle on the “R”. So, a very good example of the those two alphabet letters used in the same words would be “square” “squire” “esquire” “require”…..
bdid1dr on June 30, 2013 at 4:41 pm said:

So, people, what kind of frequency readings would you get if you “run the numbers” on my alpha-substitutions, at least in the botanical and balnealogical pages of the Voynich?
Job on July 1, 2013 at 6:26 am said:

Menno, thanks for bringing that to my attention.

I can confirm that the suffix “edy” does not occur in quite a few folios – most notably in f1 through f25, but in other folios as well.

Here’s a data plot indicating the occurrence of the suffix “edy” across all folios – the percentage value is relative to the number of words in the folio:
http://voynichms.appspot.com/images/suffix-occurrence-by-folio-edy.png

The absence of “edy” seems to result from the fact that the characters “e” and “d” very rarely occur in sequence in folios f1 through f25. Here’s a plot of the occurrence of “ed”:
http://voynichms.appspot.com/images/fragment-occurrence-by-folio-ed.png

Overall, usage of the character “e” seems to increase in later folios, starting out fairly small:
http://voynichms.appspot.com/images/letter-occurrence-by-folio-e.png

Also interesting is that “e” (the second most frequently occurring letter in the text) does not occur at all in f36r (62 words):
http://brbl-zoom.library.yale.edu/viewer/1006144

Finally, relating to your second suggestion, i transformed the contents of the VMS to collapse consecutive occurrences of the same character (e.g. “ii” and “iii” are replaced with “i”).
In the resulting text, word variability is not really impacted (it’s actually slightly higher overall).

While character repetition such as “i”, “ii” and “iii” contribute to the number of single-character variants, this does not seem to be the underlying cause.
Menno Knul on July 1, 2013 at 12:47 pm said:

Job, you made very interesting graphs, which clearly show the absence of the -edy ‘affix’ in the first 50 pages, which is not the case for the -ody affix.

With regard to the aiii-aii-ai I went a step further to link this to the -am, -ar affix. I got the idea that there may be some misunderstanding between -ain and -am, which look very similar in the written text. I could not yet decide to read -ain as -am or to read -am as -ain, similarly -air as ar and -ar as -air. I’d like to know your opinion about that.
What you noticed about the absence of -e- on f36r is interesting indeed. As it pertains to one single page only it could be by accident.

I would like to make a comment on the special signs, which are related to the alchemistic signs and not to alphabetical or numerical signs. I can hardly believe that most paragraphs by accident start with words which begin with P or T unless we got here the name of a plant like in other herbaria many names start with the word Herba. This could mean, that the initial P and T should be interpreted as ‘markers” and do not belong to the text itself. I hope you understand me.
Knox on July 1, 2013 at 8:06 pm said:

EVA-edy is dense in B language sections. EVA-eody is the same with a big exception; it is rare in Quire 13 but frequent in some A language sections.

I have observations about EVA-m and paragraph-initial words that might be helpful.
Job on July 2, 2013 at 9:33 am said:

Meno,

To my untrained eye, “ain” and “am” look similar, if not identical though i have really only glanced at the original text. I would hope that the available transcriptions are self consistent, even if they introduce extra characters.

Relating to your comment regarding the use of characters “p” and “t” at the start of paragraphs, it looks like 32.7% of all paragraphs with two or more words start with “p”. The character “o” is in second with 19.4%, followed by “t” with 17.4% (“k” is next with 8).

In my sample Dante text, “c” is the most popular letter at the start of a paragraph, with 16.5%, followed by “e” with 13.6%.

In my sample Pliny text, “a” is the most popular with 12.4%, followed by “c” with 9%.

Clearly the VMS stands out in this aspect. Here are some plots of the occurrence of “p”, “o” and “t” at the start of paragraphs, across all folios:
http://voynichms.appspot.com/images/letter-occurrence-paragraph-start-by-folio-p.png
http://voynichms.appspot.com/images/letter-occurrence-paragraph-start-by-folio-o.png
http://voynichms.appspot.com/images/letter-occurrence-paragraph-start-by-folio-t.png

On a separate topic, i’ve scanned the VMS in an attempt to find common word sequences. Since there aren’t that many, of significant size, i ended up searching for common word sequences containing words that are similar.

Here are the similar sequences consisting of six words:

f107v Par 15 Word 12
[aiin, chey, qol, aiin, al, chedy]
f77v Par 10 Word 27
[aiiin, chedy, qol, daiin, sal, chedy]

f81v Par 2 Word 76
[ol, cheky, ol, shedy, qokedy, qokedy]
f75v Par 18 Word 3
[or, chesy, sol, shey, qokeey, qotedy]

f75r Par 1:40
[r, ain, ol, ol, sheedy, qokeey]
f103r Par 17 Word 11
[lr, ain, l, ol, sheed, qokeey]

I guess there aren’t that many of these either.
Joachim Dathe on July 2, 2013 at 10:33 am said:

@ Job

A lot of things with which you torture yourself, is already available, see here:
http://archive.org/details/Voynich_Manuscript_Lexemes_List

http://archive.org/details/eva27sim

Joachim.
Job on July 2, 2013 at 9:10 pm said:

Joachim,

That’s actually the type of raw data dump i’d rather avoid, especially in PDF format which is not easily processed – it
would be far easier for me to write a Java query that yields the same information.

I hope you don’t think i’m scanning the VMS manually – that would be torturous.
Joachim Dathe on July 2, 2013 at 11:22 pm said:

Job,

Maybe you didn’t realize the different formats available there, including “full text” which could be processed easily.

BTW. No one, so to speak, processing the EVA-Alphabet as a derivative of the original code, building such way an additional
encryption layer, has any chance to make progress in deciphering.

Joachim.
Menno Knul on July 3, 2013 at 4:18 pm said:

My latest comment on Joachim Dathe got lost. The idea was, that the signs P and T as first letter of the first word of paragraphs should be interpreted as markers to indicate a category like Herba or Erba.

Menno
bdid1dr on July 5, 2013 at 1:43 am said:

Menno: The elaborate “P” which is seen at the beginning of any Voynich botanical discussion is the entire two-letter combination of S-p for the entire word “Specie” — that is for the botanical discussions.

When you see that elaborate “P” being stretched and/or followed by vowels you are looking at latin words for “prescription”, “proscribe”, “portion”. I re-iterate one example I’ve discussed before: The word prescription can be condensed into three letters-in-one — which can be found on just about any doctor’s prescription pad and/or drugstore symbols: If it looks like a capital “R” with a slash mark across the protruding leg, you are looking at the contracted word beginning of presc-rip-tion.
T Anderson on July 6, 2013 at 1:57 pm said:

Nick, i was going to email you, but this box is so convenient.

My general thoughts on the the VM align with Nick, I’m not sure if this is due to a background in computer programming and history or not.

My opinion, first off, is that we have to first look in all earnest at the period right after the carbon dates give for production of the skins. Next we have to come to terms with what our creator could and could not do during this era. (assuming the VM isn’t a copy)

I’m going to narrow in here and say it’s likely Italian, and that the person who made it is possibly person Alberti mentions, and probably knows http://it.wikipedia.org/wiki/Giovanni_Fontana_(scienziato) and his speculum http://upload.wikimedia.org/wikipedia/it/6/6b/Giovanni_fontana_speculum.jpg which i’m sure is where Alberti got the idea from.

I don’t think Fontana was concerned enough with the natural world to be our author, but i haven’t researched him enough to say for sure. What i can say is he came up with the right ideas that someone could have used to create the VM( he created the speculum and wrote using an invented system). He apparently came up with many so called memory devices other than the speculum but i haven’t looked into them yet.
nickpelling on July 6, 2013 at 2:23 pm said:

T A: overlook Fontana at your peril! The only reason I didn’t consider him more when I wrote “Curse” was that I believed (wrongly) that he was long dead by 1450 (whereas in fact he died after 1454). Moreover:

* He was a doctor
* He was the only other Renaissance author of a book-length cipher we know about
* He wrote an encyclopaedia
* His rotating memory devices bear more than a passing resemblance to Alberti’s cipher disk.
* The Italian Wikipedia article on him talks about his experiments in natural science etc

Even though I ended up pinning my flag to Filarete’s mast back in 2006, I should say that Fontana is my #2 candidate (and I don’t really have a #3 I’m particularly happy with).

If you’re looking for someone who knows everything Fontana knows, why not Fontana himself? 😉
Job on July 6, 2013 at 8:25 pm said:

Fontana seems like a really good candidate.

The use of a mnemonic device is plausible and could account for many of the properties of the VMS, such as:
1. Evidence of word-centric encoding.
2. Evidence of structure within words.
3. Evidence of patterns generally associated with natural languages (resulting from 1 above).
4. Possibility of extra state having been used in the encoding process, suggested by the number of word variants.

A mnemonic device could act as a sort of dynamic codebook.

Moreover, his association with witchcraft and varied interests are also compatible with the manuscript’s contents – in particular the suggestion that, quoting from a roughly translated Wikipedia article:

His works can be deduced or speculate travel to Bologna, Ravenna, Rome , but also in distant countries (some perhaps in his imagination).

If the author was associated with the practice of witchcraft, then it’s possible that the unrecognized illustrations of plants would have been compositions assembled from various elements, not so much to depict known species but to formulate the existence and possible usage of new types of plants.

For example, if root X and leaf Y of two different species are understood to have certain effects, then what would a single plant combining both X and Y be used for?

Perhaps the author was simply depicting and describing the properties of fantastic plants “to look for”.
nickpelling on July 6, 2013 at 8:41 pm said:

Job: the idea that Fontana may have used his circular mnemonic device to generate some set of Voynich words seems at first interesting. But this is stateless, and would surely preserve the semantic structure of sentences.

All the same, I do like Fontana – he’s #2 for me for a whole load of good reasons! 🙂
xplor on July 6, 2013 at 11:49 pm said:

For # 3
You should have considered Francesco (Cicco) Simonetta, 1410-1480. Then you could have read

Rules for Decrypting Enciphered Documents Without a Key,
Then you would know where LHR is.
T Anderson on July 7, 2013 at 2:54 am said:

I’ve known about the VM for years, and looked through it, browsed books etc here and there without ever participating in the conversation, so thanks for the warm reception.

This is the reason I’m not saying it has to be Fontana. Although I think he either designed or inspired the cryptography if the italian wikipedia articles mention that he designed many memory machines is correct then i wonder what survives of his work?

The “speculum” on its own would be stateless, but you could imagine an easy jump to statefulness.

One thing i haven’t seen discussed enough when talking about the VM is that the people we suggest as possible authors are often people who due to their vocations would have greater artistic ability than we see in the VM. Fontana, Alberti, Averlino were all architects. I do not see the hand of a 15th century architect in the VM castle.

Given this, could the VM be the work of an apprentice? Or a copy by an apprentice? Going off on a tangent here, I still that Fontana is the right person at the right time to create the VM from a technical point of view. It’s also of note that he is the only one to publish books in code during the 15th century, although i think the titles and such were latin.
Diane O'Donovan on September 25, 2013 at 12:55 pm said:

“A manuscript is something between a gadget and a personality”.

– Robert Pierce Casey

had to share that one 🙂

cited from recent post in

hmmlorientaliadotwordpressdotcom
D.N.O'Donovan on May 15, 2023 at 8:06 am said:

Montemurro may, or may not be pleased to find that his is the only researcher’s name mentioned in a recent article contributed to the Encyclopaedia Britannica about the Voynich manuscript.

The article is so incredibly bad – still quoting Newbold from 1921 as if gospel, filled with silly phrases such as “the Voynich codex”, wrongly describing the quires as of ‘parchment’ rather than, as they are, of rough non-uterine vellum, and re-making just about every error and erroneous assumption from history, art-history, logic, cryptography and anything else relevant which have been addressed and modified or corrected over the past two decades.
And it’s marked “fact-checked” god love ’em.

Maybe I’ll finally post something on academia.edu – a reply. Anyone care to make it a joint protest/response?
Josef Zlatoděj Prof. on May 15, 2023 at 8:28 pm said:

Montemurra and Zanette? Everything is wrong. But I see how we got away very well. Year 2013.
Rene Zandbergen on May 16, 2023 at 8:24 am said:

What’s wrong with it in my opinion?
– The spelling of Beinecke: this is of completely unacceptable
– That Montemurro and Zanette iswould be evidence of meaning
– That Voynich might have faked it is even mentioned at all. There are far more serious bad theories than that one.
D.N.O'Donovan on May 16, 2023 at 1:38 pm said:

When you see an article in the Britannica about a medieval manuscript, you expect to find that article has been written by an eminent professional in the field of medieval studies, and manuscript studies and so forth.
In this case, you’d also expect input from people with a wide knowledge of how (if at all) the manuscript’s study has proceeded since the 1970s.

That Britannica article does not read like an article of that kind. It doesn’t read as if it’s an article about a medieval manuscript. It’s hardly more than a cut-and-paste using the same old stuff you see in the slick potted histories which hardly look deeper than the Beinecke catalogue entry (1960s) and d’Imperio’s book (1970s). Really – it’s 2023!!

It repeats that exploded notion that Newbold’s impressionistic ‘six divisions’ can be treated as valid, when those which have been put to the test have been found invalid. And in any case, a manuscript isn’t divided “by illustrations” but by quires.
And to conflate the artefact with the contents might be excused in casual conversation, but in a Britannica article?

The person who put that article up hasn’t only made the Britannica look amateurish and ignorant, but has effectively made the E.B. an active disseminator of things which are mere speculations, or which have been superseded by research since the 1990s.

I don’t mind that amateurs want to write about a medieval manuscript, but I do mind if they can’t be bothered doing enough work to do the subject justice by looking into what is, and is not, called fact.