Brazilian academics: Voynichese “incompatible with random texts”…

A paper came out a few days ago on arXiv.org, called “Probing the statistical properties of unknown texts: application to the Voynich Manuscript” written by three Brazilian academics (with assist from two German academics).

The authors grouped Voynichese (i.e. Voynich text) hypotheses into three broad categories:

“(i) A sequence of words without a meaningful message;
(ii) a meaningful text written originally in an existing language which was coded (and possibly encrypted) in the Voynich alphabet; and
(iii) a meaningful text written in an unknown (possibly constructed) language.”

After developing a whole load of word-occurrence-based statistical machinery (defining “intermittency”, etc) and applying them both to real text corpora and to Voynichese, they conclude that the word structure of Voynichese is incompatible with shuffled texts (which is how they model (i)-class hypotheses), and “mostly compatible with natural languages” (the (ii)- and (iii)-class hypotheses). They end up by using their statistical machinery to suggest Voynichese “keywords” – words that, according to their statistical measures, stand out from the text.

Their suggested English keywords (generated from the New Testament) are:-
* begat Pilates talents loaves Herod tares vineyard shall boat demons ve pay sabbath hear whosoever

Their suggested Voynichese keywords (generated from an EVA transcription, though they don’t say which, so possibly Takahashi’s?):-
* cthy qokeedy shedy qokain chor lkaiin qol lchedy sho qokaiin olkeedy qokal qotain dchor otedy

OK, but… what do I think? First off, I’m pleased to see that their results seem incompatible with “shuffled texts” or randomized texts, because that is what nearly all of the various Voynich “hoax” hypotheses rely on. Intuitively, just about anyone who has worked with Voynichese for any period of time is struck by its intense internal structuring on many levels: so it is nice to see the same result coming out from a different angle.

Secondly, what they mean by “mostly compatible” is that while Voynichese passes many of their proposed tests comfortably, it actually fails some of them (and only passes others by the slimmest of whiskers). To me, that implies either (a) an exotically- (and non-obviously-)structured language or constructed lanaguge, or (b) an obfuscated language (e.g. a ciphertext or shorthand): conversely, it seems to imply that Voynichese isn’t a one-to-one-map of any mainstream language (which is what cryptographers such as Elizebeth Friedman have been saying for years). Yet the earliest constructed language we currently know of was devised at least a century after the Voynich’s vellum dating (and about a century after its earliest marginalia), so we can almost certainly rule that possibility out.

I don’t know: while it’s always good to see people approaching the Voynich Manuscript from a new angle, I can’t help but feel that in just about every instance the Voynich’s author remains at least three or four steps ahead of them. The key paradox of Voynichese revolves around the fact that even though it so resembles a natural language, the way its words work as semantic units fails to do so in quite the same way. So for me, the important thing here is to try to understand the tests that failed, and see what they tell us about how Voynich words don’t work… but that will doubtless take a little time.

As for the suggested keywords: personally, I’d be rather more convinced by their statistical machinery if it had automagically suggested the word “Jesus” rather than “boat” or “vineyard” for the New Testament, so I have to say I’m far from persuaded that their list of Voynich cribs will help us unlock its secrets at all… but you never know, so perhaps let’s give them the benefit of the doubt on this one! icon wink Brazilian academics: Voynichese incompatible with random texts...

19 Comments

  1. avatar Ruby March 7, 2013 12:13 pm

    If these researchers had found the key of the manuscript, I think it will make us more sad than happy. Because in this case all our efforts, we fans Voynich, would be cut short.
    I think the script could be written in two or three languages ​​at least. In addition, it is possible that one of these languages ​​is not as widely studied.
    If the manuscript is a translation of a document of European explorers in South America, for example, the Quechua language is very different from the English language. First, it was not writtenand may be the author sought to invent an alphabet to transcribe?
    This language does not sound “b” and “g” for example. The modern alphabet has the letters “k”, “p”, “q” and “t” in triplicate.
    Will be this language a better candidate for the Voynich?

    http://readingvoynich.wordpress.com/

  2. avatar Diane March 7, 2013 1:05 pm

    I’ve often wondered how many of these problems are due to a failure to recognise, and distinguish individual glyphs. I’d like to see retina-recognition technology applied to the script. I expect it would be so precise that every single glyph would be considered unique. Perhaps someone who makes bank-notes, or a forger of same is what we need here?

    http://voynichimagery.wordpress.com/

  3. avatar Diane March 7, 2013 1:12 pm

    Ps I think they might have included a fourth category, though perhaps no-one’s theory has required it as yet.

    a meaningful text written originally in an existing language, which was coded (and possibly encrypted) in another language, attempting to employ the same script as the original, this imitation being known as “the Voynich alphabet”.
    :D

    http://voynichimagery.wordpress.com/

  4. avatar nickpelling March 7, 2013 1:29 pm

    Ruby and Diane: yes, there’s a category (iv) the researchers missed – a (language A text) transcribed using a mismatched (language B alphabet). But if that’s right, we haven’t had much luck identifying either A or B so far. :-(

    And in fact, there may be twenty more categories that are similarly possible (if you really put your mind to working them out)!

    http://www.nickpelling.com/

  5. avatar bdid1dr March 7, 2013 7:35 pm

    Nick, Diane, & Ruby (and ThomS, if he should be following this latest discussion):

    You can be burying my posts and letter-by-letter/word-for-word translations in the depths of “That Which Brings…..”, but I am not discouraged. I will soon be completing my notes and fully understandable reading of the fascinating Beinecke manuscript 408.

    So, I’ll let you continue your decryptologizing in peace. I notice that my spell-checker doesn’t recognize that long word I just typed; maybe I’ve invented a new system of cryptology?

    Gotta find my O 2 it B4 it gets buried in my toppling pile of notes!

    Hows that 4 a cryptic farewell? :)

  6. avatar Diane March 7, 2013 9:47 pm

    I based category (iv) on an ambiguity of Baresch’s letter. Come to think of it, the ambiguity might not exist in the Latin, but the English can surely be read in two ways.

    “… acquired the treasures of Egyptian medicine …brought them back with him and buried them in this book in the same script.”
    quoting Neal’s translation of the 1637 letter from B to K

    http://voynichimagery.wordpress.com/

  7. avatar Diane March 8, 2013 1:06 pm
  8. avatar Jody March 8, 2013 8:04 pm

    And so we go on… not knowing… already found out

  9. avatar bdid1dr March 9, 2013 5:55 pm

    An “O2-it” is a disk which has printed upon it “TUIT” So, when someone has a big pile of chores, errands, corresponce to write, etc. — and is being nagged — the response is often “When I get around to it!” The person doing the nagging hands the naggee the “TUIT”

    Sometimes the disc looks like this: :)

    But only if a yellow “smiley” appears here!

  10. avatar Ruby March 15, 2013 2:15 pm

    bdid1dr
    If I understand correctly, you do not have a website or personal blog, and to follow your work we must search hundreds of posts on different sites ? This is regrettable; I’d love to read your translation in full.
    Ruby

    http://readingvoynich.wordpress.com/

  11. avatar tricia March 24, 2013 1:56 pm

    The next version will be even better. A Greek and Canadian helping now.

  12. avatar Diane March 24, 2013 2:07 pm

    I always have difficulty with the definition of ‘natural language’ since it seems to be based on an assumption of prose, or at the very best prose and formal poetry. (Is that right?) A mathematical text, or bills of lading, ledger entries and Ptolemy’s Tables would all fail by that criteron, surely?

    Wouldn’t they?

    http://voynichimagery.wordpress.com/

  13. avatar Diane March 30, 2013 5:42 pm

    the earliest constructed language we currently know of was devised at least a century after the Voynich’s vellum dating.

    Nick
    Just wondering whether you mean that sixteenth-century mainland Europe is our first example of any constructed language? Would you call something like Hisperica famina a constructed language? (not that I think Voynichese is like H.f, but wonder that human beings had never before thought to construct one).

  14. avatar Joachim Dathe June 22, 2013 9:19 am
  15. avatar nickpelling June 22, 2013 9:31 am

    Joachim: if Voynichese is a natural language, then what these results (and numerous others) are saying is that it’s a natural language that has less information in a typical word (or indeed a typical letter instance) than other natural languages – that is, it is more predictable at a letter and word level than other natural languages.

    In the cipher world, we have a very old type of cipher that has many similar attributes – verbose cipher. The key thing about verbose cipher is that it disrupts the way we visually parse the text: we don’t see ABC, but “otolal” (i.e “ot-ol-al”).

    If Voynichese had been parsed differently, would these academics’ results have been the same? I suspect not.

    http://www.nickpelling.com/

  16. avatar Joachim Dathe June 22, 2013 9:54 am

    Nick: On this subject I am betraying here something in advance:
    We are working on a dictionary, Voynich- Arabic-English, a small excerpt here (not finalised):
    27 > .lTaih. (27*7 لذيح on the way
    28 > .ltan. (27*6) لتان tongue
    29 > .marh. (27*6) مرح fun
    30 > .oltarh. (27*8) الطرح a question
    31 > .oqwey. (27*7) اقوي stronger
    32 > .taly. (27*6) التالي next

    http://voynich2arabic.wordpress.com

  17. avatar Diane June 22, 2013 10:57 am

    not a natural outgrowth, then? ?

  18. avatar nickpelling June 26, 2013 1:36 pm
  19. avatar Šuruppag November 16, 2013 11:13 pm

    “Yet the earliest constructed language we currently know of was devised at least a century after the Voynich’s vellum dating, so we can almost certainly rule that possibility out.”

    What about Hildegard of Bingen’s Lingua Ignota, dating to the 12th century?

    It would seem very odd if VM was a constructed language, but I think we should still try to rule out the possibility. Has anyone tested to see if VM has linguistic patterns similar to highly artificial constructed languages such as Lojban?

Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>