Knight & Reddy on the Voynich, & the limits of statistical analysis…

USC’s irrepressible Kevin Knight and Dartmouth College Neukom Fellow Sravana Reddy will be giving a talk at Stanford on 13th March 2013 entitled “What We Know About the Voynich Manuscript“. Errm… which does sound uncannily like the (2010/2011) paper by the same two people called, errrm, let me see now, ah yes, “What We Know About the Voynich Manuscript“.

Obviously, it’s a title they like. icon smile Knight & Reddy on the Voynich, & the limits of statistical analysis...

As I said to Klaus Schmeh at the Voynich pub meet (more on that another time), what really annoys me when statisticians apply their box of analytical tricks to the Voynich is that they almost always assume that whatever transcription they have to hand will be good enough. However, I strongly believe that the biggest problem we face precedes cryptanalysis – in short, we can’t yet parse what we’re seeing well enough to run genuinely useful statistical tests. That is, not only am I doubtful of the transcriptions themselves, I’m also very doubtful about how people sequentially step through them, assuming that the order they see in the transcription of the ciphertext is precisely the same order used in the plaintext.

So, it’s not even as if I’m particularly critical of the fact that Knight and Reddy are relying on an unbelievably outdated and clunky transcription (which they certainly were in 2010/2011), because my point would still stand regardless of whichever transcription they were using.

In fact, I’d say that the single biggest wall of naivety I run into when trying to discuss Voynichese with people who really should know better, is that hardly anyone grasps that the presence of steganography in the cipher system mix would throw a spanner (if not a whole box of spanners) in pretty much any neatly-constructed analytical machinery. Mis-parsing the text, whether in the transcription (of the shapes) and/or in the serialization (of the order of the instances), is a mistake you may well not be able to subsequently undo, however smart you are. You’re kind of folding outer noise into the inner signal, irrevocably mixing the covertext into the ciphertext.

Doubtless plenty of clever people are reading this and thinking that they’re far too smart to fall into such a simple trap, and that the devious stats genies they’ve relied on their whole professional lives will be able to fix up any such problem. Well, perhaps if I listed a whole load of places where I’m pretty sure I can see this happening, you’ll see the extent of the challenge you face when trying to parse Voynichese. Here goes…

(1) Space transposition cipher

Knight and Reddy are far from the first people to try to analyze Voynichese word lengths. However, this assumes that all spaces are genuine – that we’re looking at what modern cryptogram solvers call an “aristocrat” cipher (i.e. with genuine word divisions) rather than a “patristocrat” (with no useful word divisions). But what if some spaces are genuine and some are not? I’ve presented a fair amount of evidence in the past that at least some Voynichese spaces are fake, and so I doubt the universal validity and usefulness of just about every aggregate word-size statistical test performed to date.

Moreover, even if most of them are genuine, how wide does a ciphertext space have to be to constitute a plaintext space? And how should you parse multiple-i blocks or multiple-e blocks, vis-a-vis word lengths? It’s a really contentious area; and so ‘just assuming’ that the transcription you have to hand will be good enough for your purposes is actually far too hopeful. Really, you need to be rather more skeptical about what you’re dealing with if you are to end up with valid results.

(2) Deceptive first letters / vertical Neal keys

At the Voynich pub meet, Philip Neal announced an extremely neat result that I hadn’t previously noticed or heard of: that Voynichese words where the second letter is EVA ‘y’ (i.e. ’9′) predominantly appear as the first word of a line. EVA ‘y’ occurs very often word-final, reasonably often word-initial (most notably in labels), but only rarely in the middle of a word, which makes this a troublesome result to account for in terms of straightforward ciphers.

And yet it sits extremely comfortably with the idea that the first letter of a line may be serving some other purpose – perhaps a null character, or (as both Philip and I have speculated, though admittedly he remains far less convinced than I am) a ‘vertical key’, i.e. a set of letters transposed from elsewhere in the line, paragraph or page, and moved there to remove “tells” from inside the main flow of the text.

(3) Horizontal Neal keys

Another very hard-to-explain observation that Philip Neal made some years ago is that many paragraphs contain a pair of matching gallows (typically single-leg gallows) about 2/3rds of the way across their topmost line: and that the Voynichese text between the pair often presents unusual patterns / characteristics. In fact, I’d suggest that “long” (stretched-out) single-leg gallows or “split” (extended) double-leg gallows could well be “cipher fossils”, other ways to delimit blocks of characters that were tried out in an early stage of the enciphering process, before the encipherer settled on the (far less visually obvious) trick of using pairs of single-leg gallows instead.

Incidentally, my strong suspicion remains that both horizontal and vertical Neal keys are the first “bundling-up” half of an on-page transposition cipher mechanism, and that the other “unbundling” half is formed by the double-leg gallows (EVA ‘t’ and ‘k’). That is to say, that tell-tale letters get moved from the text into horizontal and vertical key sequences, and replaced by EVA ‘t’ (probably horizontal key) or EVA ‘k’ (probably vertical key). I don’t claim to understand it 100%, but that would seem to be a pretty good stab at explaining at least some of the systematic oddness (such as “qokedy qokedy dal qokedy qokedy” etc) we do see.

Regardless of whether or not my hunch about this is right, transposition ciphers of precisely this kind of trickiness were loosely described by Alberti in his 1465 book (as part of his overall “literature review”), and I would argue that these ‘key’ sequences so closely resemble some kind of non-obvious transposition that you ignore them at your peril. Particularly if you’re running stats tests.

(4) Numbers hidden in aiv / aiiv / aiiiv scribal flourishes

This is a neat bit of Herbal-A steganography I noted in my 2006 book, which would require better scans to test properly (one day, one day). But if I’m right (and the actual value encoded in an ai[i][i]v group is entirely held in the scribal flourish of the ‘v’ (EVA ‘n’) at the end), then all the real content has been discarded during the transcription, and no amount of statistical processing will ever get that back, sorry. icon sad Knight & Reddy on the Voynich, & the limits of statistical analysis...

(5) Continuation punctuation at end of line

As I noted last year, the use of the double-hyphen as a continuation punctuation character at the end of a line predated Gutenberg, and in fact was in use in the 13th century in France and much earlier in Hebrew manuscripts. And so there would seem to be ample reason to at least suspect that the EVA ‘am’ group we see at line-ends may well encipher such a double-hyphen. Yet even so, people continue to feed these line-ending curios into their stats, as if they were just the same as any other character. Maybe they are, but… maybe they aren’t.

Incidentally, if you analyze the average length of words in both Voynichese and printed works relative to their position on the line, you’ll find (as Elmar Vogt did) that the first word in a line is often slightly longer than other. There is a simple explanation for this in printed books: that short words can often be squeezed onto the end of the preceding line.

(6) Shorthand tokens – abbrevation, truncation

Personally, I’ve long suspected that several Voynichese glyphs encipher the equivalent of scribal shorthand marks: in particular, that mid-word ’8′ enciphers contraction (‘contractio’) and word-final ’9′ enciphers truncation (‘truncatio’) [though '8' and '9' in other positions very likely have other meanings]. I think it’s extraordinarily hard to account for the way that mid-word ’8′ and word-final ’9′ work in terms of normal letters: and so I believe the presence of shorthand to be a very pragmatic hypothesis to help explain what’s going on with these glyphs.

But if I’m even slightly right, this would be an entirely different category of plaintext from that which researchers such as Knight and Reddy have focused upon most… hence many of their working assumptions (as evidenced by the discussion in the 2010/2011 paper) would be just wrong.

(7) Verbose cipher

I’ve also long believed that many pairs of Voynichese letters (al / ol / ar / or / ee / eee / ch, plus also o+gallows and y-gallows pairs) encipher a single plaintext letter. This is a cipher hack that recurs in many 15th century ciphers I’ve seen (and so is completely in accord with the radiocarbon dating), but which would throw a very large spanner both in vowel-consonant search algorithm and in Hidden Markov Models (HMMs), both of which almost always rely on a flat (and ‘stateful’) input text to produce meaningful results. If these kinds of assumptions fail to be true, the usefulness of many such clever anaytical tools falls painfully close to zero.

(8) Word-initial ’4o’

Since writing my book, I’ve become reasonably convinced that the common ’4o’ [EVA 'qo'] pair may well be nothing more complex than a steganographic way of writing ‘lo’ (i.e. ‘the’ in Italian), and then concealing its (often cryptologically tell-tale) presence by eliding it with the start of the following word. Hence ‘qokedy’ would actually be an elided version of “qo kedy”.

Moreover, I’m pretty sure that the shape “4o” was used as a shorthand sign for “quaestio” in 14th century Italian legal documents, before being appropriated by a fair few 15th century northern Italian ciphers (a category into which I happen to believe the Voynich falls). If even some of this is right, then we’re facing not just substitution ciphers, but also a mix of steganography and space transposition ciphers, all of which serves to make modern pure statistical analysis far less fruitful a toolbox than it would otherwise be for straightforward ciphers.

* * * * * * *

Personally, when I give talks, I always genuinely like to get interesting questions from the audience (rather than “hey dude, do you, like, think aliens wrote the Voynich?”, yet again, *sigh*). So if anyone reading this is going along to Knight & Reddy’s talk at Stanford and feels the urge to heckle ask interesting questions that get to the heart of what they’ve been doing, you might consider asking them things along the general lines of:

* what transcription they are using, and how reliable they think it is?
* whether they consider spaces to be consistently reliable, and/or if they worry about how to parse half-spaces?
* whether they’ve tested different hypotheses for irregularities with the first word on each line?
* whether they believe there is any evidence for or against the presence of transposition within a page or a paragraph?
* whether they have compared it not just with abjad and vowel-less texts, but also with Quattrocento scribally abbreviated texts?
* whether they have looked for steganography, and have tried to adapt their tests around different steganographic hypotheses?
* whether they have tried to model common letter pairs as composite tokens?

I wonder how Knight and Reddy would respond if they were asked any of the above? Maybe we’ll get to find out… icon wink Knight & Reddy on the Voynich, & the limits of statistical analysis...

Or you could just ask them if aliens wrote it, I’m sure they’ve got a good answer prepared for that by now. icon smile Knight & Reddy on the Voynich, & the limits of statistical analysis...

36 Comments

  1. avatar Diane March 10, 2013 12:41 am

    Hi Nick
    I’ve been looking for stats on two things – if you or anyone here can offer a link, I’d be glad.

    (i) maximum line-length for the text and also if that’s possible, any evidence of standard margin-widths.

    (ii) details of variation in spaces between lines. I’m wondering if space left between each 5th and 6th line (or 6th and 7th) might be consistently wider or narrower than the rest.

    Diane

    http://voynichimagery.wordpress.com/

  2. avatar Ivan Y March 10, 2013 7:06 am

    TL;DR version of your very thorough post: garbage in, garbage out.

  3. avatar nickpelling March 10, 2013 2:29 pm

    Ivan: actually, the post was more about the opposite side of the coin – that if you don’t want garbage out, be very choosy about the garbage you bring in. :-)

    http://www.nickpelling.com/

  4. avatar nickpelling March 10, 2013 2:46 pm

    Diane: maximum line length very much depends on how you join the EVA strokes together to make actual letters, so it’s a fairly subjective measure. And I don’t think anyone has measured things in a way that will give you the second datum you’re looking for – Philip Neal mentioned having added some additional metadata to the copy of the transcription he was working with, but I don’t recall those metadata including physical measurements.

    http://www.nickpelling.com/

  5. avatar Diane March 10, 2013 2:58 pm

    It would be easier if a scale appeared in the Beinecke zoom, wouldn’t it?

    Yes, when I said line-length, I did mean in mm. or inches.

    Suppose I could try a letter to the Beinecke..

    Thanks.
    D.

    http://voynichimagery.wordpress.com/

  6. avatar Rich SantaColoma March 10, 2013 4:45 pm

    I would go, but I am on the other side of the country, in New York. There are a couple of Voynicheros in California, though, so I’ll mention it elsewhere. Represent.

    http://proto57.wordpress.com/

  7. avatar bdid1dr March 10, 2013 10:06 pm

    Nick, I wish I could attend Knight and Reddy’s event at Stanford (I live not so far away). Howsomever, because of my dependency on reading lips (and more often than not ending up at the back of the auditorium) I’m somewhat useless at live performances/lectures. I do have an acquaintance at UC Santa Cruz who might be interested in their talk. I’ll try to contact her by phone after I finish this note.

    As far as the use of various size “4″ or “9″ figures as cipher, here is my translation of at least those figures as well as the use of the “&” here is my use in translating whole discussions:

    8 = aes
    & = aes

    9 = g or k

    tiny 9 which loop crosses to back of the down-stroke = ex

    c = c

    smaller c (which sometimes has a bar attached) = e

    The elaborate large curlicued “P” represents “B” or “P”, capital B or P which can also imply full words such as “Pliny”, “Botticelli”, “Prescription”. One can also “tack onto” that large initial P various other curlicues which gives you such words as Especially, or “Especies”.

    The difference in the alphabet letters “R” and “S” is very small: If the letter looks like a question mark without the dot, it is the Volsci/Cyrillic sibilant capital “C”. If the letter looks like a backward capital S, it is the letter “r”.

    The latin letters “U” and “V” are represented by “eo” or “oe” in context to what word is being expressed:

    Ennyway, what I have discovered lately, is that Athanasius Kircher may have been somewhat “Latin-impaired” when he was identifying various elements of the Alban Hills and Lakes”.

    Still fun. I hope you will “dwell upon” my alpha-linguistic-cipher notes for a little while at least — and compare them with your EVA and D’Imperio’s nonsense. Oh yes, there’s a word for you: “dwell”. Several weeks ago, I tried to demonstrate how one could stretch that “ell” glyph (you called it a “bracket”) and insert it into another word in that sentence.

    The “loopy double l” phoneme has manifold uses, in that other syllables can be tacked onto either side without having to add any other vowels. Same thing goes for the “tl”.

    Example: xmpl (“ex” is represented by that tiny “9″, “m” is represented by what looks like a fish-hook with two points).

    n e oea ave fun decoding y’all !

    :)

  8. avatar bdid1dr March 10, 2013 10:36 pm

    In response to Diane’s query re line lengths and spacing:

    What I’ve found when it comes to the Water Lily folio and the crocus folio, so far, is that the discussion for each of the botanicals seems to first identify each specimen’s historical uses. Later discussion further down the “stem” appears to be brief references to historical or philosopher/mythologists works/legends.

    Folio 35r Crocus (saffron, and legend of Crocus & Smilax)

    Folio 11v : Possibly a mulberry fruit, which I am focusing on today because there are only 6 lines of discussion!

    Later!

    bd…..

  9. avatar Diane March 11, 2013 1:35 am

    Rich
    Most people interested in this manuscript read Nick’s blog, I expect an will share details here if they can and wish.

    Otherwise, I have friends close to the library who might spare some time if all else fails.
    D

    http://voynichimagery.wordpress.com/

  10. avatar Karl March 11, 2013 3:32 am

    I’ve been drafting an email to send to Kevin (we know each other from grad school), and figured I share the current draft:

    Kevin,

    After having it in my pile of things to read, I finally had a chance to read through “What we know about the Voynich Manuscript” in detail, and thought I’d offer some friendly feedback/suggestions (basically, a cross between what I would have written as a reviewer and some info you may find of interest). Hope you find this useful. Any new findings in the talk you’re giving this week?

    • Choice of transcription alphabet. While I’m partial to Currier myself, decisions regarding whether Currier ‘X’ is a single character or some combination of ‘S’ and ‘F’, or that ‘G’ is a single character rather than ‘IE’ impact entropy calculations, word length distributions, HMM results, etc. While Currier justifies his choices in the paper you cite (http://www.voynich.nu/extra/curr_main.html), this is a point worth mentioning.
    • Transcriptions. While I’m not a huge fan of the European Voynich Alphabet (EVA) (http://www.voynich.nu/extra/eva.html), the interlinear transcription at http://www.ic.unicamp.br/~stolfi/voynich/Notes/062/L16+H-eva/text16e7.evt is worth looking at for several reasons: a) it integrates multiple transcriptions including (but not limited to) those behind currier.now, b) translation from EVA to Currier is straightforward using ‘sed’ or a similar tool, and c) the comment blocks contain accumulated data on suggested plant IDs, etc. from various sources. Another transcription worth looking at is Glen Claston’s (http://notakrian.pbworks.com/f/voyn_101.zip). When it come to recording variants while transcribing an unknown script there is a tension between “splitters” and “groupers”, and Glen is a “splitter” (partly because he was pursuing Leonell Strong’s polyalphabetic theory, which involves changing alphabets as well as a cyclical shift sequence, and Glen believed that some of the variants signaled alphabet shifts). Also, IIRC, Glen’s was the first transcription done from Yale’s hi-res color scans.
    • With regard to “word” length distribution, a number of people (myself included) have suggested that some spaces are inserted according to some rule(s), at least partially explaining the correlation between word endings and beginnings pointed out by Currier. Slightly more than a fifth of the spaces in the Biological B pages in D’Imperio’s transcription, for instance, are a ’9′ followed by a ’4′, and there are a very small number of ’94′ digraphs without spaces (possibly scribal errors). Whether inserting spaces between some specified set of letter pairs in a plaintext in this way produces something more like a binomial distribution, as seen in the Mss, is a testable hypothesis, but I haven’t tried it yet.
    • With regard to HMM models of the Voynich Biological B text, you should take a look at Mary D’Imperio’s paper “An Application of PTAH to the Voynich Manuscript” (http://www.nsa.gov/public_info/_files/tech_journals/Application_of_PTAH.pdf).
    • You cite Gabriel Landini’s paper in your references, but I didn’t see a discussion in Section 5.3 or elsewhere discussion his findings re:long range correlations in the Mss. Also of interest in that regard is Mark Perakh’s paper applying the Letter Serial Correlation test to the Mss. (http://www.talkreason.org/Mark%27s%20sites/Mark%27s%20perakm%20site/members.cox.net/marperak/Texts/voynich2.htm).
    • An analytic tool I tried but didn’t write up (although I don’t know that it produced any deep insights) was applying algorithms for learning certain classes of regular languages from only positive instances to the vocabulary of Biological B “words”. In particular, I tried one of Angluin’s algorithms for learning a k-reversible regular language from positive examples, and Garcia & Vidal’s algorithm for learning the k-Testable Language in the Strict Sense family of regular languages (http://users.dsic.upv.es/grupos/tlcc/papers/fullpapers/GVO90.pdf)). I’m not sure if better algorithms for learning REs from only positive examples have been developed since.
    • You say in Section 5.1, “Notably, the text has very few repeated word bigrams or trigrams, which is surprising given that the unigram word entropy is comparable to other languages.” There is some possibility that multiple letters in Currier’s alphabet are actually the same character. For instance, if there is a “word” in the Biological B vocabulary containing an ‘S’, with high probability the same word occurs with ‘Z’ substituted for the ‘S’, suggesting they might actually be the same letter. It has been suggested that some of the “gallows” characters may be interchangeable (f57v, for instance has both ‘B’ and ‘V’ in the same position in the repetitions of the key-like sequence). Failing to combine such interchangeable letters (if they are in fact interchangeable) will obviously reduce the number of longer repeats.
    • A cipher hypothesis you don’t mention is what folks on the Voynich mailing list generally refer to as a “verbose (monoalphabetic) cipher”, in which some plaintext letters correspond to combinations of multiple Voynich characters. For instance, if u = ‘S’, r = ‘C89′, o = ’4OF’, b = ‘CC89′, and s = ‘CC9′, then “ZC89/4OFCC89/4OFC89/4OFCC9″ on line 13 of f77r would correspond to the word “uroboros” (I’m not suggesting that as a serious crib, but as an illustration of the idea). This type of cipher would explain the low h1 and h2 values. If this is the type of cipher involved, the trick is figuring out what the breakdown into combinations is — the “words” 4OEAM, OEAM, and EAM all occur: are they (4O-E-AM, OE-AM, and E-AM) or (4OE-AM, O-E-AM, and E-AM), etc? An attempt has been made to apply genetic search to this problem (http://voynichattacks.wordpress.com/tag/genetic-algorithm/).
    • Nick Pelling offers some observations on your paper at http://www.ciphermysteries.com/2013/03/09/this-week-a-talk-at-stanford-on-the-voynich-manuscript
    • His blog also mentions a new paper on arXiv.org called “Probing the statistical properties of unknown texts: application to the Voynich Manuscript” (which I haven’t had a chance to read yet).
    • Have you accumulated a list of on-line machine readable corpora of 15th cent. texts in various likely languages? I’ve found http://quod.lib.umich.edu/c/cme/ for Middle English, but haven’t found a similar resource for e.g., early 15th century Latin herbal or alchemical mss.
    Hope this is useful feedback/info — don’t decipher the Voynich before I do :->

    Karl

  11. avatar Joachim Dathe March 11, 2013 2:19 pm

    It is somewhat simpler:
    Some weeks ago I had published statistic reports, based on EVA. For this I used my new N-gram software “ngraman”, where N can represent an arbitrarily high ordinal number.
    If this is applied on EVA (N=25), with spaces and other special characters ignored, you will get completely all lexical items (words, phrases, particles):
    http:/goo.gl/iyqiY
    BTW. The differences between A and B Currier languages were determined similarly
    (N = 14):
    goo.gl/cUXq1

    And, that one who distrusts in my tool, should have a look at its application on modern literature (Joyce, Ulysses):
    http://goo.gl/sWCkS

    http://voynich2arabic.wordpress.com

  12. avatar nickpelling March 11, 2013 2:37 pm

    Joachim: EVA was explicitly designed as a stroke-based (or, more accurately, a component-based transcription), which means that (for example) “ch” and “sh” are each transcribed as two separate partial characters, despite their plainly being a single character when actually written.

    EVA’s authors did this not to encourage people to run statistical tests on the EVA corpus, but to encourage researchers to try out different hypotheses about what the correct alphabet should be (i.e. once you’d combined groups of strokes such as ‘ch’ into a single token).

    http://www.nickpelling.com/

  13. avatar bdid1dr March 11, 2013 4:04 pm

    Addendum:

    When many lines of commentary end with the “8″ and “9″ characters (especially in the botanicals) you are looking at the sentence ending (or nomenclature ending) aes ceus. Some instances of the same phrase are represented by only the tiny “9″ = “ex” or “excuse” or “exeus”.

    Another botanical I’m getting ready to solve/interpret is what even Boenicke tentatively identifies as an “artichoke”. Nope, never seen an artichoke with dried stem top. B-u-u-t — maybe a tree fruit – mulberry? Morus Alba? “Silkworm food” (the leaves, anyway)? We’ll see. I’ll keep you posted.

    One item of interest, which I didn’t find in the botanicals or pharmaceutical sections, is the fruit of the mandragore. See Boenicke 408, folio 83v. The script which appears beneath the globes tells us that the watered down fruit juice can ease pain.

    So, Nick, my reason for posting all of this is so that maybe you or a “Voynichero” may be able to observe to the “Knight-Reddi” presentation at Stanford with a “newer point of view”. Too bad I can’t participate (I wouldn’t be able to get close enough to the podium to read the speaker’s lips).

  14. avatar Joachim Dathe March 11, 2013 4:40 pm

    Nick, does that mean that you despise any statistical analysis based on EVA, not only mine?

    If one glyph is expressed by several roman letters, it means for my N-gram tool only that the N is to be incremented. (The internal comparison processes always remain the same). And the determined frequencies of di- and tri-grams may indicate then one single letter to choose from.

    http://voynich2arabic.wordpress.com

  15. avatar nickpelling March 11, 2013 4:55 pm

    Joachim: “despise” is the wrong word, “despair” is far closer.

    It’s just that when I saw your list of n-grams headed up by “ch” (with “sh” not far behind), it did make me feel as though I hadn’t really managed to get my central point across in a 2000-word post. Which is this: that the whole point of statistically analysing an EVA stroke transcription is to work out what the best non-stroke transcription is.

    The practical problem is that almost all Voynich researchers seem to have lost sight of this. :-(

    http://www.nickpelling.com/

  16. avatar nickpelling March 11, 2013 5:00 pm

    Karl: thanks for cc’ing that here, hopefully someone will attend the lecture and let us know KK’s & SR’s responses…

    http://www.nickpelling.com/

  17. avatar bdid1dr March 11, 2013 8:06 pm

    Nick & friends,

    I’m hoping that “someone” will “consideringly” read my responses on this particular discussion page, and take a “laundry-list” of questions to be directed at Knight and Reddi (if K & R will even open a Q and A dialogue).

    I THINK I understand that you would like to keep the “Voynich Manuscript” a mystery for at least a few more years. So, my next adventure will be a visit to the historical museum in San Jose California. As part of my duties as Senior Records Clerk in the City Clerk’s Office, I indexed every item of the public records. I was also responsible for overseeing the yearly microfilming of those records (for safe storage). It is one reason why UC Berkeley has microfilm copies of the earliest missionary correspondence between “headquarters” in Spain and Portugal (and maybe Rome/Frascati).

    What is most aggravating to me, is that I never set eyes on either the manuscripts themselves, nor got to read the microfilm contents! Long story! But I am now considering contacting San Jose’s Historical Museum in order to schedule a viewing of the documents which are now in “climate controlled” preservation cases for scholarly review.

    I’ll do my best to see if there is any resemblance to Boenicke ms 408. (In the 1970′s the City of San Jose contracted with a professional translator, who was unable to do a meaningful translation. His reason was that the San Jose documents were written in a “clerical hand” used by the clerks of the Royal Courts.

    It may be months before I can write. Maybe this will give you “breathing room”?

    Cheers ! :)

  18. avatar Joachim Dathe March 11, 2013 8:48 pm

    Nick: Partly I can understand you. You do not like ‘ch’ and ‘sh’ as an eye catcher. Therefore I have for you a N-gram analysis, starting with the longest possible terms (including word breaks):
    http://goo.gl/CqSyi

    For codebreakers remains a problem: Where do we see text scrambling resulting from encryption when lexical items are ordered so natural language like?

    http://voynich2arabic.wordpress.com

  19. avatar nickpelling March 12, 2013 12:29 am

    Joachim: EVA is a stroke-based interim transcription, a precursor to a glyph-based transcription. So when I see frequent strings starting “h…” (i.e. partway through ch / sh / cfh / cth / ckh, cph), I feel extremely uncomfortable making any sort of inference from them.

    In my opinion, the two questions that need considering most in order to solve the Voynich Manuscript are:-

    (1) How did the original author group EVA strokes togetehr to form complete letters? (And what evidence do we have to support whatever conclusion we draw about this in preference to other possible final transcriptions?)

    (2) What is the correct order / sequence for parsing letters in a word / line / paragraph / page? (And what evidence do we have to support our conclusions on this?)

    http://www.nickpelling.com/

  20. avatar Joachim Dathe March 12, 2013 6:14 am

    bdid1dr:
    A question to K&R:
    What is their opinion (if any) about:
    “Discussion and conjectures” by Prof. J. Stolfi
    (which agree completely with my own investigations):
    http://www.dcc.unicamp.br/~stolfi/voynich/00-06-07-word-grammar/#s.disc

    “..severe constraints on cryptological explanations”

    “Semitic languages such as Arabic, Hebrew, or Ethiopian could perhaps be transliterated into Voynichese, but not by any straightforward mapping.”

    Not straightforward, I agree, as I found different meanings for EVA-k, single letters for “ii”, “ch”, indefinite word separation etc.
    I can certainly prove that the Voynich is not encrypted, by N-gram observed distribution of the lexical items.

    http://voynich2arabic.wordpress.com

  21. avatar nickpelling March 12, 2013 8:05 am

    Joachim: if you think the text isn’t written straightforwardly, then surely you are saying that it *has* been encrypted (i.e. in the sense of “hidden” or “concealed”)?

    Your n-gram test results indicate (correctly) that a lot of structure is present, which is precisely the kind of thing more modern cipher systems (such as polyalphabetic substitution) disrupt and destroy.

    However, there are still a thousand other ways to hide text that predate all that kind of “data-flattening” mathematical trickery, so I can see no obvious reason to eliminate “old-fashioned” encipherment just yet. :-)

    http://www.nickpelling.com/

  22. avatar Joachim Dathe March 12, 2013 10:33 am

    Nick: Yes, that’s right, and I’m talking myself in my documentation (about Arabic) of an encryption stage. And I’m still somewhat away from deciphering VMs completely. The content seems kind of more cryptic than the language itself :-)

    BTW. Did you see my pdf on language A and B (significant) differences?

    http://voynich2arabic.wordpress.com

  23. avatar nickpelling March 12, 2013 1:34 pm

    Joachim: no, I’m sorry to say that I haven’t seen your PDF on A/B language differences. Can you pass on a link to it, please?

    http://www.nickpelling.com/

  24. avatar Joachim Dathe March 12, 2013 2:59 pm

    Nick: The link to a pdf where A – B differences
    are shown:
    http://goo.gl/cUXq1

    (only the 14-grams, for A complete).

    http://voynich2arabic.wordpress.com

  25. avatar Dave March 14, 2013 7:58 pm

    Here’s a downloadable mpeg4 version of the Stanford lecture video if you don’t want to mess around with the streaming:

    http://zodiackillerciphers.com/2013-03-13-Stanford_Voynich_Lecture.mp4

    http://zodiackillerciphers.com

  26. avatar Dave March 14, 2013 7:59 pm

    (addendum: video file size is 233MB)

    http://zodiackillerciphers.com

  27. avatar Joachim Dathe March 15, 2013 5:00 pm

    Nick: after reading your article more carefully, I see an approach to clarify some things by statistical analysis.
    I speak of n-grams, language-independent analysis: identification of lexical items by their observed frequencies. So, I now can submit an N-gram EVA evaluation that assesses all particles within an expression even in all lower N orders. Important to know: When reading EVA, all spaces and special characters are ignored.
    The result here:
    http://goo.gl/807jN

    http://voynich2arabic.wordpress.com

  28. avatar bdid1dr March 16, 2013 11:01 pm

    Nick and Friends,

    While you’ve all been focusing on Knight and Reddy, I’ve been cruising “Carmina Brigiensia” and “Carmina Burana”. I have found a perfect match for that “berry” which appears on Boenicke 408′s folio 11v: Wikipedia has a very good discussion of:

    Carmina Burana : “The Forest” – is an elaborate illustration, of which one feature stands out from all the rest. See for yourself, and you may be able to understand my translation of why that “mulberry” signifies so prominently, yet just as obscurely, in our VM-ystery folio 11V. It is all about the leaves of the white mulberry tree, which were “pabulumox”/fodder/food for the silkworm larvae until they began to spin their cocoons. In this case I guess I can pun that the proof is in the “pabulumox” (pudding?) :)

  29. avatar bdid1dr March 17, 2013 8:07 pm

    Happy St. Patrick’s Day! Though I am part Scots-Irish, I can still do the Irish Step Dances and reels (Google has it for their logo today).

    Ennyway, my take on Knight-Reddy’s presentation as I was able view online, pretty bad! (I read lips and body posture to get at least some clues to what is being discussed.)

    So y’all will just have to limp along without me — ahem!

    Proceed!

    bdid1dr :)

  30. avatar Diane April 6, 2013 5:42 pm

    If one supposed that the text’s line length = paragraph length, then occurrences of that

    ” pair of matching gallows (typically single-leg gallows) about 2/3rds of the way across their topmost line”

    might serve the same purpose as the cartouche does in hieratic.

    whether it enclosed the name of a deity, person, animal, thing or something having aspects of several (as stars do) might be an additional complication, of course.

    diane

  31. avatar nickpelling April 6, 2013 5:52 pm

    Diane: the cartouche shape means “name” whereas I suspect Neal keys are more like a general-form enciphered medieval bracket pair. But apart from that, we’re signing from the same (antiphonal) songsheet. :-)

    http://www.nickpelling.com/

  32. avatar Diane April 7, 2013 3:51 am

    Dear Nick,

    Thank you, I’m yet to be blessed with grandchildren.

    Diane

  33. avatar Diane April 7, 2013 4:46 am

    Nick

    I’ve just done what I should have done *before* offering the ‘cartouche’ comment.

    i.e. googled “Voynich” AND “cartouche” …

    so enervating.

  34. avatar Diane April 14, 2013 6:13 am

    I should put this into the forum, but replies there are rare.

    Can someone explain why curious groups such as dain, dain, qokedy dain (etc.) are treated as a function of encoding, rather than an indication of original language?

    I mean, why does dicussion not focus on languages which regularly show a pattern of a,a, b, a, a?

    Similarly with char-groups that appear only in certain positions?

    On the other side of the coin, why is ’4o’ taken as a function of the language rather than a result of encipherment. (Is encipherment a word?), but the top-line gallows, reversely (I’m sure that’s not a word!).

    Illustration; suppose the ’4o’ translates ‘and’. Sometimes it would appear in a word ‘candy’ but its appearance as itinial ‘And…..’ would reflect a usage natural to some languages and not others. (In English it is found in that position regularly only in translation, and mostly of Biblical texts or renderings from Aramaic.

    What if dain, dain, qokedy daiin was equivalent i~ in one or another language ~ to e.g. ‘Verily, verily, I-say-to-thee, with VerilyPlus… .[that shall such and such occur].

    So why does no-one seem to spend time on matching those patterns with various grammatical forms?

    This reminds me of in interesting paper written in 1995 by Clive Holes, ‘The Structure and Function of Parallelism and Repetition in Spoken Arabic: a sociolinguistic study’, Journal of Semitic Studies, Spring issue, pp.57-82 in my p/copy though the bibliography looks a bit short.
    On p.79 he said,

    I have tried to lustrate the variety of functions which patterned repetition – lexical, morphological, syntactic or combinations of them – can fulfill in the speech of non-literate Arabs… almost complete lack of the same features in speech of the younger generations in the same speech-communities.. some (older) speakers made such use of the repetitive devises.. and with such effect, that parts of the recording sound like a species of artistic performance.. discursive, paratactic, concrete, religious, committed. The ‘literate’ style … succinct, ratiocinative, abstract, distanced.

    But there are cases of interaction – I’ll cite my usual example of Majid’s treatise on navigational astronomy and method.

    Diane

    interesting paper about internal assonance and rhythm in spoken colloquial Arabic:

  35. avatar voynichimagery August 8, 2013 2:33 am

    a substitution cipher from Spain which some thought might prove to be Voynichese.
    http://languagelog.ldc.upenn.edu/nll/?p=4337

  36. avatar Diane August 13, 2013 1:23 pm

    Not my area – so no comment except that this paper by William Pourquet may interest people working on Voynichese
    http://gtalug.org/wiki/Meetings:2013-04

    (On that site, the paper is available in pdf)

Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>