A paper came out a few days ago on arXiv.org, called “Probing the statistical properties of unknown texts: application to the Voynich Manuscript” written by three Brazilian academics (with assist from two German academics).
The authors grouped Voynichese (i.e. Voynich text) hypotheses into three broad categories:
“(i) A sequence of words without a meaningful message;
(ii) a meaningful text written originally in an existing language which was coded (and possibly encrypted) in the Voynich alphabet; and
(iii) a meaningful text written in an unknown (possibly constructed) language.”
After developing a whole load of word-occurrence-based statistical machinery (defining “intermittency”, etc) and applying them both to real text corpora and to Voynichese, they conclude that the word structure of Voynichese is incompatible with shuffled texts (which is how they model (i)-class hypotheses), and “mostly compatible with natural languages” (the (ii)- and (iii)-class hypotheses). They end up by using their statistical machinery to suggest Voynichese “keywords” – words that, according to their statistical measures, stand out from the text.
Their suggested English keywords (generated from the New Testament) are:-
* begat Pilates talents loaves Herod tares vineyard shall boat demons ve pay sabbath hear whosoever
Their suggested Voynichese keywords (generated from an EVA transcription, though they don’t say which, so possibly Takahashi’s?):-
* cthy qokeedy shedy qokain chor lkaiin qol lchedy sho qokaiin olkeedy qokal qotain dchor otedy
OK, but… what do I think? First off, I’m pleased to see that their results seem incompatible with “shuffled texts” or randomized texts, because that is what nearly all of the various Voynich “hoax” hypotheses rely on. Intuitively, just about anyone who has worked with Voynichese for any period of time is struck by its intense internal structuring on many levels: so it is nice to see the same result coming out from a different angle.
Secondly, what they mean by “mostly compatible” is that while Voynichese passes many of their proposed tests comfortably, it actually fails some of them (and only passes others by the slimmest of whiskers). To me, that implies either (a) an exotically- (and non-obviously-)structured language or constructed lanaguge, or (b) an obfuscated language (e.g. a ciphertext or shorthand): conversely, it seems to imply that Voynichese isn’t a one-to-one-map of any mainstream language (which is what cryptographers such as Elizebeth Friedman have been saying for years). Yet the earliest constructed language we currently know of was devised at least a century after the Voynich’s vellum dating (and about a century after its earliest marginalia), so we can almost certainly rule that possibility out.
I don’t know: while it’s always good to see people approaching the Voynich Manuscript from a new angle, I can’t help but feel that in just about every instance the Voynich’s author remains at least three or four steps ahead of them. The key paradox of Voynichese revolves around the fact that even though it so resembles a natural language, the way its words work as semantic units fails to do so in quite the same way. So for me, the important thing here is to try to understand the tests that failed, and see what they tell us about how Voynich words don’t work… but that will doubtless take a little time.
As for the suggested keywords: personally, I’d be rather more convinced by their statistical machinery if it had automagically suggested the word “Jesus” rather than “boat” or “vineyard” for the New Testament, so I have to say I’m far from persuaded that their list of Voynich cribs will help us unlock its secrets at all… but you never know, so perhaps let’s give them the benefit of the doubt on this one!