Darrell Huff’s (1954) “How to Lie With Statistics” is a twentieth century classic that’s well worth reading (I have a well-thumbed copy on my bookshelf that I bought back in the 1980s). It’s basically a breezy introduction to statistics, that concentrates largely on how people get things wrong in order to get across the general idea of how you might (possibly, hopefully) try to get things right in your own work.

A journalist rather than an academic statistician, Huff’s book ended up selling more than 1.5 million copies. You can hear echoes of his reversed-expectations presentation in numerous other book titles, such as Bill Hartston’s “How to Cheat at Chess”.

Sadly, The Truth Is Much, Much Worse

When later I did statistics modules at University, the awful truth slowly dawned on me: even though tools (such as Excel) make it easy to perform statistical procedures, stats really isn’t just a matter of “running the numbers”, cranking out an answer, and drawing some persuasive-looking graphs.

Even just conceiving a statistical experiment (e.g. something that’s based on good data, and that stands a chance of yielding meaningful results) is extraordinarily hard. Designing statistical experiments (e.g. understanding the sampling biases that are inevitably embedded in the data, and then working out how to work around them) is also hugely tricky. Executing them is no mean feat either: and then – finallyinterpreting them is fraught with difficulty.

In general, my own experience of statistical experiments is that at least half are fatally misconceived; of the remainder, half are horribly misdesigned; of the remainder of that, at least half are sadly misexecuted; and of the remainder of that, at last half of the results are tragically misinterpreted. Note that the overall success rate (<5%) is for people who broadly know what they’re doing, never mind idiots playing with Excel.

A Story About Stats

Back when I was doing my MBA, one of the final marked pieces was for the statistics module. When I took a look at the data, it quickly became clear that while most of the columns were real, one in particular had been faked up. And so I wrote up my answer saying – in a meta kind of way – that because that (fake) column was basically synthetic, you couldn’t draw reliable conclusions from it. And so the best you could do in practice was to draw conclusions from the other non-synthetic columns.

I failed the module.

So, I made an appointment with the lecturer who marked it, who also happened to be the Dean of the Business School.

  • I said: Why did you fail this piece?
  • He said: Because you didn’t get the right answer.
  • I said: But the column for the ‘right’ answer is fake.
  • He said: I don’t think so.
  • I said: Well, look at this [and showed him exactly how it had been faked]
  • He said: Oh… OK. I didn’t know that. But… it doesn’t matter.
  • I said: errrm… sorry?
  • He said: you’ve got a Distinction anyway, so there’s no point me changing this mark

And so I still failed the statistics module.

The Voynich Manuscript and Stats

If you think Voynich Manuscript researchers who run statistical tests on Voynichese are somehow immune to these fundamental hazards, I don’t really think you’re paying enough attention.

Until you accept that the core problems inherent in Voynichese transcriptions – there are many, and they run deep – will inevitably permeate all your analyses, you really are just running the numbers for fun.

The main things that bother me (though doubtless there are others that I can’t think of right now):

  • Transcription assumptions
  • Transcription error rates
  • Running tests on the whole Voynich Manuscript, rather than on sections (e.g. Q13, Q20, Herbal-A)
  • How Voynichese should be parsed into tokens (this has bugged me for 20 years!)
  • Copying errors and Voynichese “weirdoes”
  • The bifolios being out of order
  • Whether there is a uniform ‘system’ underlying both Currier A and Currier B
  • The problems with top-line text
  • The problems with line-initial letters
  • The problems with line-final letters
  • etc

With so many parallel things to consider, I honestly think it should be no surprise that most attempts at Voynich analysis fail to achieve anything of value.

Voynich Theories

I have no doubt that researchers do their best to be rational and sensible, but many Voynich theories – or, perhaps more accurately, Voynich ‘approaches’ – are built upon a fundamentally flawed statistical ‘take’, e.g. that Voynichese is just a simple (but highly obscure) text.

Unpopularly, this seems to be true of just about all ‘Baxian’ Voynich linguistic analyses. Statistically, nothing supports the basic assumption of a ‘flat’ (but obscure) language. In fact, Voynichese is full of confounding, arbitrary, difficult, unlanguagelike behaviours (see the incomplete list above), all of which you have to compensate for to get your data to a point where you even begin to have something remotely language-like to work with. But hardly anybody ever does that, because it’s too tricky, and they’re not genuinely invested enough to do the ‘hard yards’.

It’s also true of Gordon Rugg’s table ‘take’; and of just about all simple ciphers; and – also unpopularly – of hoax theories (why should meaningless text be so confounded?) And so forth.

The sad reality is that most researchers seem to approach Voynichese with a pre-existing emotional answer in mind, which they then true to justify using imperfect statistical experiments. More broadly, this is how a lot of flawed statistical studies also work, particularly in economics.

In fact, statistics has become a tool that a lot of people use to try to support the lies they tell themselves, as well as the lies their paymasters want to be told. This is every bit as true of Big Oil and alt.right politics as of Voynichology. Perhaps it’s time for an even more ironic 21st century update to Darrell Huff’s book – “How To Lie To Yourself With Statistics”?

If you’ve previously looked at the final page of the Voynich Manuscript (f116v), you probably know exactly what I’m talking about here: the strange marginalia / pen-trials / whatever that aren’t Voynichese, but aren’t obviously anything else either. The middle two lines of “michitonese” are interspersed with crosses, making them resemble a prayer or protective charm (when reading these out, you’re supposed to make the sign of the cross when you see a ‘+’). The two michitonese words I’m going to focus on today look like “ahia maria“, which is what “Saint Mary” would sound like in Greek. But is that what they really say?

Parsing the page

To my mind, the initial challenge is working out not how to read the page (Lord knows we don’t need yet more speculative michitonese decipherments), but rather how to parse the page. Knowing that crosses are a structural feature of prayers and charms is a helpful start, but I suspect that differences in inks and letter shapes are also key things to bear in mind.

I discussed this page in The Curse of The Voynich (2006), and proposed that what we are often seeing here is a page whose writing had faded so much by the time that a later (in fact, probably much later) owner looked at it that they felt compelled to emend it before it disappeared completely from sight.

Along those same line, I’ve previously wondered whether the first two words of the pair of lines (which have been immortalised as “michiton oladabas”) might originally have read “nichil obstat”, before fading almost to nothing and being emended into their current state. This isn’t just a matter of ink density, as some people have suggested: for example, the ‘l’ of oladabas looks like it was written by a 15th century quill pen but the ‘n’ of michiton looks to me like it was written by a 16th century metal nib. And if you compare all three ‘a’s of oladabas, you’ll struggle to work out which is original and which is emended. (For what it’s worth, I believe the middle ‘a’ is original and the other two are emendations.)

So broadly speaking, I feel that the first thing we have to do is parse “ahia maria”, i.e. try to work out which parts of it are original and which are likely later emendations.

Multispectral scans

Though we only have (frankly shocking) quality multispectral scans of f116v that were left on the Beinecke’s file server several years ago, these are vaguely helpful here. Here’s what I was able to salvage from “Voynich_116v_WBRBB47_108_F.jpg”:

Though this is hard to make out, it suggests that the final ‘a’ of (what looks like) “ahia” wasn’t emended, but the rest of the word was. The first “ma” of “ma+ria” also seem to have been worked on.

In general, it seems that we can probably trust that the “ma+ria” part was what was originally written, but we need a bit more insight before we can come to a conclusion about the “ahia” part.

The History of Hail Mary

Given the prayer-/charm-like context, it would seem very reasonable to hypothesise that the two words might originally have read Hail Mary (in some language). But when did people start saying Hail Mary?

The Greek form (which begins Θεοτόκε Παρθένε, χαῖρε, κεχαριτωμένη Μαρία, ὁ Κύριος μετὰ σοῦ. ) developed independently of the Latin form “Ave Maria, gratia plena“. The latter took shape around the turn of the first Millennium, with Thomas Aquinas noting that the biblical original (“Ave, gratia plena“, from the Gospel of Luke) had had the word “Maria” inserted. The Latin form first appeared in print in 1495 in Girolamo Savonarola’s “Esposizione sopra l’Ave Maria“.

In terms of what forms of Hail Mary we might reasonably be looking to find, then, the Greek version is distinctly unpromising, while the Latin is very much limited to Ave Maria.

Yet if you search for medieval Latin charms that mention Mary, you’ll find a good number that invoke her name (e.g. “+ sancta Maria peperit +” in CUL Additional 9308, folios 49r-50r, mentioned here, footnote 155), often when hoping to ensure a good outcome from childbirth.

Additionally, “Maria Maria” appears doubled in a particular branch of Gaudendum nobis est (see MSS A-GU 30, CH-EN 1003, CH-EN 102, and Stuttgart 95, mentioned here, Table 4-6, which are from Engelberg and Lambrecht). We’ll see why this is relevant in the next section…

The Other Maria(s)

We have to be careful that we don’t let people’s prior readings over-influence our attempt to make sense of the page. The “six + marix + morix +” text on the same line is visually suspicious to me, simply because both the marix and the morix look a lot like maria, which all matches up with the rest of the line:

That is, the line could easily, when taken as a whole, be saying something like “six + maria + maria + vix + ahia + ma+ria +”, i.e. very much less than you might at first think.

Multispectrally, the morix shows signs that its ‘o’ and ‘x’ were emended, but the rest left intact:

Of course, “six” isn’t actually a Latin word, so that too is an interpretative reading of the line. Moreover, its middle “i” is shaped completely unlike the ‘i’s in any of the Mari[ax] words, so was almost certainly emended. So: if some of the original text’s a’s were later misread (and emended) as ‘x’s, might this have originally been an abbreviation for “Sancta”, and so read more like “S[anct]a + Maria + Maria + vix + Ahia + Ma+ria +”?

And again, “vix” (= hardly, scarcely, barely, only just, with difficulty, reluctantly, etc) is perhaps not a Latin word you might expect to see in a charm or prayer: its middle “i” is shaped completely unlike the ‘i’s in any of the Mari[ax] words, so also looks like an emendation. So this too is a bit of an awkward fit.

Your thoughts, Nick?

Well, I wasn’t clear on this in 2006, and to be honest I’m not much further along in 2023. It seems to me that we’re looking at this page through the wrong kind of technical lens, because we’re not really able to answer even the simplest of questions about its codicology. Multispectral imaging was a bit of a bust for f116v, though to be fair this is exactly what a lot of the literature points out for studying pages with different ink layers (you really need Raman imaging to separate out different inks).

A different approach, which I’ve proposed for a while now, would be to map the surface of f116v to see if we can detect differences in the depth of pen indentations between different codicological layers, and then pick out the earliest layer for its meaning. And in fact, in a post on BBC Futures today, this is exactly what the Factum Foundation in Madrid has been doing with the Archiox Project for the Bodleian library.

Who knows what intrepid codicologists might find hidden in the surface of the Voynich Manuscript?

Like most people, I’ve read numerous articles about ChatGPT recently. Perhaps the most entertaining was where ChatGPT (sort of) played chess against the #1 chess engine Stockfish, with hilarious consequences (if you’re a chess player, that is).

However, few articles manage to get the point across that ChatGPT is not an all-knowing Oracle, but rather something that is trying to please you by providing answers that are as realistic-looking as it can make them (rather than actually realistic). It’s entertainment, people (and definitely not real).

Perhaps if I gave some examples you’ll see what I’m talking about…

“Did Filarete have his own herbal manuscript?”

I asked ChatGPT this question, because it was something which I already knew the answer to (and had blogged about back in 2008, and also since). The right answer is that Filarete did have his own herbal manuscript (it was mentioned in fol.78r of MS Ashburnham 198, “Sed et Antonius Averlinus Philaretus lingua vernacula scripsit eleganter“), but as far as I know, nobody knows if it or a copy of it still exists.

ChatGPT started safely and correctly:

There is no known herbal manuscript that can be definitively attributed to Filarete, the fifteenth-century Italian architect and writer. Filarete is best known for his architectural treatise, the “Treatise on Architecture,” which he wrote in the early 1460s.

ChatGPT

However, our chatty digital friend then went a bit off-piste:

However, there are some herbal manuscripts from the same general period that are associated with Filarete or have been attributed to him. For example, there is a manuscript in the Bibliotheca Augusta in Wolfenbüttel, Germany, that is sometimes referred to as the “Herbal of Filarete.” This manuscript contains descriptions and illustrations of various plants and their medicinal uses, but it is not clear whether Filarete was actually the author of the manuscript or was simply associated with it in some way.

ChatGPT

Interesting, I thought, even if I didn’t believe a word of it. It went on:

There are also other herbal manuscripts from the fifteenth century that are similar in style and content to the “Herbal of Filarete” and may have been produced by the same workshop or circle of artists and writers. These manuscripts are sometimes referred to as the “Herbarium of Apuleius” or the “Herbarium of Pseudo-Apuleius” and contain similar descriptions and illustrations of plants and their uses.

ChatGPT

“What is the shelfmark of that “Herbal of Filarete” you mentioned?”

I still didn’t believe a word, so asked it: “What is the shelfmark of that “Herbal of Filarete” you mentioned?” The response I got back was a little unexpected:

I apologize for the confusion in my previous response. Upon further research, I could not find any evidence of a manuscript called the “Herbal of Filarete” in the Bibliotheca Augusta in Wolfenbüttel, Germany.

I believe there may have been a misunderstanding or confusion with other sources. While there are many herbal manuscripts from the medieval period, there is no known herbal manuscript that can be definitively attributed to Filarete.

I apologize for any confusion or misinformation my previous response may have caused. If you have any further questions, please feel free to ask.

ChatGPT

It’s certainly true that the Bibliotheca Augusta in Wolfenbüttel has at least one Pseudo-Apuleius manuscript, so it looked like there was just some confusion going on. Gigantic input dataset aside, ChatGPT really isn’t anything like Google, so it would be grossly unfair to imagine that it is doing anything like the same thing when it tries to answer a query.

So I decided to take a different tack, to see if I could flush anything out about what was going on.

“Are there any herbal manuscripts associated with Filarete?”

Maybe there’s a herbal manuscript out there that’s loosely associated with Filarete or his coterie that I don’t know about, I thought. So I tried to poke the bear again (but in a slightly gentler way than before).

Can you guess how ChatGPT responded? I think you can. As before, the first paragraph was fairly solid:

To my knowledge, there are no herbal manuscripts that can be definitively attributed to Filarete, the fifteenth-century Italian architect and writer. Filarete is best known for his architectural treatise, the “Treatise on Architecture,” which he wrote in the early 1460s.

ChatGPT

But you might find the next paragraph eerily familiar:

While Filarete is not known to have authored any herbal manuscripts, there are some medieval manuscripts that contain descriptions and illustrations of plants and their medicinal uses that have been associated with him or with his circle of artists and writers. One example is a manuscript in the Biblioteca Medicea Laurenziana in Florence, Italy, that is sometimes referred to as the “Herbarium of Filarete.” This manuscript contains descriptions and illustrations of various plants and their medicinal uses, but it is not clear whether Filarete was actually the author of the manuscript or was simply associated with it in some way.

ChatGPT

Once again, this seems like an historical hallucination on ChatGPT’s part. The odd part is that it reprised the exact shape of its previous hallucinatory answer, but referring to a different top-end library. So, I again asked for the shelfmark of this imaginary manuscript…

“What is the shelfmark of that manuscript in Florence?”

That’s what I asked: and, sadly, you already know what ChatGPT’s reply is going to look like, because it’s basically the same as before:

I apologize for the error in my previous response. After further research, I have found that there is no known “Herbarium of Filarete” manuscript in the Biblioteca Medicea Laurenziana in Florence, Italy.

While Filarete is not known to have authored any herbal manuscripts, there are some medieval manuscripts that contain descriptions and illustrations of plants and their medicinal uses that have been associated with him or with his circle of artists and writers. One example is a manuscript known as the “Herbarium of Apuleius,” which is sometimes referred to as the “Herbal of Pseudo-Apuleius.” This manuscript is kept at the Bibliotheca Augusta in Wolfenbüttel, Germany, and has the shelfmark Codex Guelf. 82.7 Aug. 2°.

ChatGPT

What to make of this?

ChatGPT is undoubtedly a brilliant technical achievement. But as far as I can see, for now it’s just part of the entertainment industry.

Here, I do wonder whether the format of the question I asked kind of loosely defined the format of the answer ChatGPT inferred I was hoping to receive back: and that it then hallucinated enough additional stuff to complete the gaps in the answer format to provide a well-formed-looking answer. So maybe the whole point of ChatGPT is that the shape of the question defines the shape of the answer, even if there aren’t any actual facts inside the answer. Something to think about, anyway!

Over at voynich.ninja, there’s an interesting recent thread on the (in-)homogeneity of Voynichese, i.e. how consistent (or inconsistent) with itself the Voynich Manuscript’s text is (either locally or globally). Given that I have been working on Q20 recently, I thought it might be interesting to take a brief look at that quire through this particular lens.

In Search of a Problem Statement

One intriguing side of Voynichese is that even though it exhibits high-level consistency (e.g. the continuous script, plus the well-known differences between Currier A pages and Currier B pages), medium-level consistency (e.g. thematic-looking sections such as Q13, Q20, Herbal-A, Herbal-B etc), and even bifolio-level consistency (more on this below), there are open questions about the apparent lack of low-level consistency.

In particular, Voynichese ‘words’ (which have been the subject of countless studies and analyses) present many apparent local inconsistencies. As Torsten Timm pointed out in the voynich.ninja thread referenced above, words that are extremely common on one page of a section can be completely absent from the next. And, awkwardly, this is sometimes even true for pages that are the recto and verso sides of the same folio.

Even though there are countless ways to airily explain away these kinds of inconsistencies (change of subject matter, change of source structure, change of underlying plaintext language, change of local cipher key, etc), all too often I think these are invoked more as a research excuse for not actually going down the rabbit hole. (And I for one am bored stiff of such research excuses.)

So, before we start reaching gleefully for such cop-out answers, we need to first properly lock down what the core low-level consistency problem actually is. Basically, what specific behaviours can we point to that indicate that Voynichese has a problem here?

Captain “ed”

It was WWII codebreaker Captain Prescott Currier himself who pointed out nearly fifty years ago that you could usually tell Currier A pages from Currier B pages simply by looking at the proportion of ‘ed’ glyph pairs on that page. (Currier A pages have almost none, Currier B pages normally have loads.)

Personally, I’d add some caveats, though:

  • Even though it might be tempting to think of “ed” as a bigram (i.e. a single token), it seems far more likely to be a contact boundary between an “e”-family token (i.e. e/ee/eee) and a “d” glyph.
  • To me, there often seems to be something funny going on with qokedy / qokeedy / etc that isn’t really captured by just looking for “ed”

Helpfully, you can use voynichese.com’s layer feature to bring to life the variation in Voynichese words containing “ed”, e.g. this query for lots of different subgroups of “ed” words. Even though Herbal A pages have basically no ed pairs at all, the ed’s nothing short of explode at the very start of Q13:

The first three pages of Q20 are very nearly as colourful:

For ed, it seems to be the case that recto and folio pages have a similar kind of ed-density: for example, if you compare f107r/f107v with f108r/f108v, you can see clearly that the two halves of each folio seem quite similar:

The f111r/f111v pair seems to buck this trend slightly, insofar as f111v (on the right) seems somewhat less ed-dense than its recto side f111r (on the left):

While I’m here, I’d note that f116r (the last proper Voynichese page of Q20) seems to have a structure break halfway down, which would be consistent with an explicit and/or a colophon placed at the end of a chapter / book:

There’s also the question of whether the two folios making up each bifolio appear ed-consistent. I’d say that this appears true for most Q20 bifolios (e.g. f103 and the top half of f116r, f104-f115, f105-f114, f106-f113) but certainly not for others (e.g. f107-f112 and f108-f111). It’s very hard to be definitive about this.

Finally, I’d also note that while Quire 8’s f58r/f58v (with their starred paragraphs) do have some ed-words, their ed-fractions are extremely low, which would make classifying them as “pure” Currier B difficult:

Torsten Timm’s “in”

Torsten Timm has similarly looked at what the usage of the Voynichese glyph pair “in” tells us. Of my own set of voynichese.com experiments, the one that seemed to me to be the most interesting was comparing “iin” with “[anything else]in”.

For example, even though iin dominates [^i]in for most of the Voynich Manuscript, the first folio of Q13 has almost no “iin”s in it at all:

Folio f111 is also a little bit odd, in that its verso side has many more [^i]in words:

“ho”-words Way

As with Currier’s “ed”, “ho” is very much a contact locus between two families of glyphs: on the left, you have ch/sh/ckh/cth/cph/cfh, while on the right you have or/ol/ok/ot/op/of etc. As such, it looks like a useful way of exploring for a group of glyph boundaries, but this does need to be carefully qualified.

If we visually group this ho-transition (via voynichese.com) in terms of the origin of the “h”, we get a query that looks like this. This reveals that most ho instances are in fact “cho” (dark blue). However, the f93r/f93v folio does look particularly unusual in this respect:

The final two paragraphs of f116r are also unusual, this time for their almost complete lack of ho-words:

If you try to classify ho-words in terms of what follows, you seem to get less predictability.

Putting ed / in / ho Together

From the preceding sections, I’d say that the overwhelming impression I get is that pages within a folio (and indeed pages within a bifolio, though to a slightly lesser extent) are actually reasonably consistent with each other, and with relatively few counter-examples.

Unsurprisingly, this is also what we see if we simply merge the three ed / in / ho queries into a single voynichese.com query. Here, we can easily pick out the dishonourable exceptions, such as f111 (where f111r is dominated by “ed” [blue], yet where f111v is dominated by “in” [red]):

If we instead highlight cho and sho separately, what emerges is that, unlike the rest of Q20, the f106-f113 bifolio has a surprisingly high proportion of sho-words (in yellow):

I could go on, but I think my visual argument here has pretty much run its course.

Thoughts, Nick?

Even though Torsten Timm used ed / in / ho as part of his argument concluding that Voynichese pages are independent of each other, I’m not sure I fully accept his conclusions. (He’s certainly right about words, but the details and ramifications of that are for another post entirely.)

For me, the behaviour of ed / in / ho seems to suggest something arguably even more unsettling: which is that there seems to be consistency at the bifolio level.

And so it seems that we’re facing a BAAFU (“Bifolio As A Functional Unit”) scenario here. Which is arguably even more mysterious than Currier’s LAAFU (“Line As A Functional Unit”), wouldn’t you agree?

In a recent post, I started trying to list out Q20 order-related microtheories, but got somewhat sidetracked by the microtheory that f58 (with its ‘Scribe 3’ starred paragraphs) may have preceded Q20 (which, for the most part, is also made up of ‘Scribe 3’ starred paragraphs). However, this whole idea is closely linked to the whole issue of what happened to Q8 (which f58 is part of) and Q14: and even though I had covered this issue before, I thought it was well worth revisiting.

The “ij” Marginalia

Back in 2009, I floated the idea that the mark at the bottom of f57v might have been an “ij”, probably ineptly added by an early Voynich owner who believed (from the nesting order of the bifolios back then) that this marked the start of the second ‘book’ (or chapter). (“ij” is an entirely conventional early modern Latin way of handwriting “2”).

This was actually part of a larger discussion of Q8 which I tried to put together to make sense of Q8’s curious foliation – essentially, why was there such a large folio numbering gap between f58 and f65? My suggestion was that not only had Q8’s bifolios been flipped around (i.e. the original folio order was f65-f66-x-x-x-x-x-x-f57-f58), but the large wodge of pages missing from the middle could well have been the nine-rosette folio (but folded down).

And before you start complaining that the nine-rosette page doesn’t have folio numbers matching this, my idea was that the foldout page had already been pulled out and rebound along a different fold, leaving a damaged vellum residue at the centre of Q8 that the foliator numbered around (just in case it later got bound back into its correct place).

However, I now suspect that even that binding was an ‘imposition’ (in the literal sense of the word), and that the original nesting order had been (say) f65-f66-f57-f58, before the bifolios were reversed and the nine-rosette page (Scribes 2 & 4) was inserted into their centre. So I would date the “ij” marginalia to the mid-fifteenth century, prior to the bifolios being reversed, and also to before the nine-rosette foldout was (mis-)bound into Quire 8.

But because f57r (the flip side of f57v) is a Herbal-B page, and f57v still faces f58r (with the paragraph stars), it seems very likely to me that these two folios faced each other in Quire 8’s original (unbound) gathering.

This would make the first few pages of Q20:

  • f57v (with the circular diagram [possibly a nocturnal, e.g. here, here and here])
    • (oh, plus the “ij” marginalia at the bottom)
  • f58r (big paragraph stars, plus a missing initial capital)
  • f58v (more starred paragraphs)
  • Quire 20 (though in an as-yet unknown bifolio ordering).

Chicken Scratches (Again)

Intriguingly, if you insert the nine-rosette page into the centre of Q8 in its original folding style, you end up putting the two pages with “chicken scratches” on right next to each other, which would seem to strongly support this whole idea. Yet because the chicken scratches line up horizontally but not vertically, I think we can reasonably infer that the nine-rosette page was only loosely bound in at this stage. Finally, because the Q14 quire number is on the correct page relative to Q14’s rebinding (i.e. not its original binding), it would seem to imply (as I described for Q13 in Curse, 2006) that these quire numbers were added after several binding iterations.

Hence the Q8 timeline would appear to be something like this:

  • f65-f66-f57-f58 (original order)
  • “ij” marginalia added
  • Q8 Bifolios reversed, nine-rosette foldout inserted but only loosely bound in
  • Chicken scratch marginalia added
  • Nine-rosette foldout ripped along original fold, removed from Q8, but vellum residue left in place
  • Nine-rosette foldout refolded along different fold, quire 14 number added, bound further along
  • Foliation added, folio numbers 59-64 skipped to work around the nine-rosette vellum residue

f85r2 Contact Transfer (?)

There is also the matter of what seems to be a paint contact transfer on f85r2.

Given what seems to be the connections between Q8 and Q14, you’d loosely expect this to have come from a Q8 herbal page: but this doesn’t seem to be the case. I then wondered whether this had been transferred across from the red-brown paint at the bottom of f87r (a close neighbour in the final binding order, codicologically speaking). However, unless the bottom edge of f87r has been extremely heavily trimmed, the curve isn’t really right:

Looking elsewhere in the manuscript, this could possibly have instead come from the bottom outside corner of f43r (but, to be honest, this curve doesn’t seem to match either):

Or from the bottom outside corner of f39r (but, again, the curve doesn’t seem to quite match):

So… where did it come from? This remains a bit of a mystery to me (and I’ve marched through all the pages of my Yale facsimile several times), but perhaps someone else will have a better insight into what happened here. I hope so!

f86v6 Contact Transfer (?)

There’s also a curious mark (that looks like a contact transfer) near the top right of f86v6 which I also currently have no explanation for:

Has anybody got any idea about this?

I hope everyone who attended the Voynich Conference 2022 hosted online by the University of Malta enjoyed the presentations and the Q&As.

In Lisa Fagin Davis’ final presentation, she mentioned her recent theory that p/f were in fact ke/te: and mentioned that she’d thought this up, but then found it on Cipher Mysteries. If you want to see the original page I put up in September 2020 suggesting this idea (along with her comment near the bottom), it’s right here, along with the August 2020 page where I started exploring the behaviour of single-leg gallows.

There’s an additional aspect to the set of gallows/e/ch groupings I discussed in 2020, which is that you can usefully compare the (parsed) ch:chch ratio in the text as a whole (which is 10616:18 (0.17%)) both to the (parsed) ratios of strikethrough gallows preceded by ch…

  • ckh:chckh = 634:242 = 38.17%
  • cth:chcth = 766:139 = 18.15%
  • cph:chcph = 185:27 = 14.59%
  • cfh:chcfh = 58:15 = 25.86%

…as well as to the (parsed) ratios of strikethrough gallows followed by ch:

  • ckh:ckhch = 871:5 = 0.57%
  • cth:cthch = 902:3 = 0.33%
  • cph:cphch = 211:1 = 0.47%
  • cfh:cfhch = 73:0 = 0%

This, too, is a strikingly asymmetric result; and would seem to suggest that the ch:chch ratio is practically identical to the ch:c<gallows>hch ratios, yet completely unlike the ch:chc<gallows>h ratio.

I would take this as reasonably good support for the idea that c<gallows>h is actually a visual proxy (and it doesn’t really matter whether this is for scribal, cryptographic, or steganographic reasons) for <gallows>ch, because Voynichese seems to want to avoid c<gallows>hch almost exactly as much as it wants to avoid chch.

Perhaps combining this result with the pe/fe result (and other “forbidden” Voynichese combinations) might be the start of something really positive…

Like (hopefully a fair few) other Voynicheros, I’ve ponied up my 50 euros for the 2022 online Voynich Conference being hosted by the University of Malta in the next few days.

One of the fields in the application form asked for my university or institution: I put down “Cipher Mysteries”, on the grounds that it has ended up a bit of a cipher institution. 🤔 But mainly to make myself laugh. 😁

Will Malta reveal something incredible, awe-inspiring, unexpected, shocking, or amazing about the Voynich Manuscript, in the way academic conferences in novels and films have primed everyone to believe? Actually… maybe, sort of. But not in the way Dan Brown and his overexcited ilk like to portray.

20+ years ago, I remember trying really hard – with almost zero success, it has to be said – to persuade anyone that the Voynich Manuscript wasn’t some kooky fake cooked up by Dr John Dee (back then the fairly dominant opinion), but a genuine historical artifact worthy of close, careful study.

Well, from the 2022 conference’s participant list and programme of papers, it seems that that aspect of my struggle back then has at least borne fruit. It’s now a serious business.

But will there be The Big Breakthrough? You know, the introvert outsider’s slide that shyly reveals The Secret Cipher Key we’ve all long dreamed of? Cue clunks round the world as Voynichero jaws collectively hit the floor.

(*snort* Not a hope, sorry.) But with so many smart, insightful, observant researchers all trying to move forward in broadly the same way, who’s to say that something won’t emerge from it all?

Perhaps it won’t be something showy (or even immediately obvious), but even a tiny step forward would feel Heaven-sent. So let’s just pray a little, hein?

I’ve been busy up in the loft, having a somewhat-overdue tidying session there. But rather than give a load of books to the charity shop (my default response), I wondered if any of my Cipher Mysteries readers would like to have some?

Voynich Novels

As you may know, I maintained my big fat list of Voynich-themed novels up until about 2012, at which point I’d really had enough of reading them for one lifetime (and so basically threw in the towel).

Hence it should be no surprise that I have a ten-book-high pile of novels mentioning the Voynich Manuscript (to greater or lesser degrees) to give away, many of which I have reviewed here (e.g. Scarlett Thomas’ “PopCo”, Brad Kelln’s “In Tongues of the Dead”, A.W. Hill’s “Enoch’s Portal”, Steve Berry’s “The Charlemagne Pursuit”, Christopher Harris’ “Mappamundi”, and Brett King’s “The Radix”), as well as a fair few others:

So, if anyone in the UK reading would like their very own instant Voynich-themed novel shelf, please let me know in the comments section below, and I’ll send the whole darn pile to whomsoever I think most oddly worthy. Paypal-ing something towards the postage would be a kind gesture, but the whole point of this exercise is to make space rather than money. 🙂

Oh, and if anyone would like to submit reviews of other Voynich-themed novels to be published on Cipher Mysteries, I’d be more than happy to post them up. Just don’t ask me to read the actual book, nothankyou. 😉

Historical Cipher-Themed Novels

I also have a chunky little box of nineteen historical cipher-themed novels to give away, where it’s more sensible to talk about weight (6.9kg) than the total page count:

(Strictly speaking, James Cowan’s “A Mapmaker’s Dream” isn’t quite in the pacy-cipher-airport-novella genre that most of the rest is, but it’s in the box regardless.)

So again, if anyone in the UK reading would like a whole bunch of historical cipher novels, please let me know in the comments section below, and I’ll send the whole box-load to some deserving soul or other. I don’t really have the patience to package up individual books, so it’s an all-or-nothing kind of thing, I’m afraid.

Cheers!

Here’s a suggestion for a Voynich Manuscript paper that I think might well be revealing: taking raking IR images of f116v. But why would anyone want to do that?

Multispectral imaging

Since about 2006, I’ve been encouraging people to take multispectral images of the Voynich Manuscript, i.e. to capture images of the manuscript at a wide variety of wavelengths, so not just visible light.

My interest here is seeing if there are technical ways we can separate out the codicological layers that make up f116v. To my eyes, there seem to be two or three different hands at play there, so it would make sense if we could at least partially figure out what the original layer there looked like (before the other layer was placed on top, I guess at least a century later).

And in fact one group did attempt multispectral scanning, though with only a limited set of wavelengths, and without reaching any firm conclusions. (They seem not to have published their results, though I did once stumble across some of their test images lying around on the Beinecke’s webserver.)

The Zen of seeing nothing

Interestingly, one of that group’s images of f116v was taken at 940nm (“MB940IR”), which is an infrared frequency (hence “IR”). This revealed… nothing. But in what I think is potentially an interesting way.

Here’s what it looks like (hopefully you remember the michitonese at the top of f116v):

Main banks Transmissive

That’s right! At 940nm, the text is invisible. Which is, of course, totally useless for normal imaging. For why on earth would you want to image something at a wavelength where you can’t see any detail?

Raking Light

The interesting thing about this is that one kind of imaging where you’d want the text itself to be as invisible as possible is when you’re doing raking illumination, i.e. where you shine an illuminating light parallel to the surface. At the edges of penstrokes (if you’re looking really closely) at high-ish magnification, you should be able to use this to see the shadows of the edge of the indentations left by the original quill pen.

And so I’ve long wondered whether it might be possible to use a 940nm filter (and a non-LED light source) and a microscope / camera on a stand to try to image the depth of the penstrokes in the words on f116v. (You’d also need to use an imaging device with the RG/GB Bayer filter flipped off the top of the image sensor; or a specialist b&w imaging sensor; or an old-fashioned film camera, horror of horrors!)

What this might tell us

Is this possible? I think it is. But might it really be able to help us separate out the two or more hands I believe are layered in f116v? Though I can’t prove it, I strongly suspect it might well be.

Why? Because vellum hardens over time. In the first few years or so after manufacture, I’m sure that vellum offers a lithe and supple writing support, that would actually be quite nice to write on. However, fast forward from then to a century or so later, and that same piece of vellum is going to be harder, drier, more rigid, slippier, scrapier – in short, much less fun to write on.

And as a result, I strongly suspect that if there are two significantly time-separated codicological layers on f116v, then they should show very different writing indentation styles. And so my hope is that taking raking IR images might possibly help us visualise at least some of the layering that’s going on on f116v, because I reckon each of these 2+ hands should have its own indentation style.

Will this actually work? I’m quietly confident it will, but… even so, I’d have to admit that it’s a bit of a lottery. Yet it’s probably something that many should be able to test without a lot of fuss or expense. Does anyone want to give this a go? Sounds to me like there should be a good paper to be had there from learning from the experience, even if nothing solid emerges about the Voynich Manuscript.

Anyone who spends time looking at Voynichese should quickly see that, rare characters aside, its glyphs fall into several different “families” / patterns:

  • q[o]
  • e/ch/sh
  • aiin/aiir
  • ar/or/al/ol/am/om
  • d/y
  • …and the four “gallows characters” k/t/f/p.

The members of these families not only look alike, they often also function alike: it’s very much the case that glyphs within these families either group together (e.g. y/dy) or replace each other (e.g. e/ee/eee/ch/sh).

For me, one of the most enigmatic glyph pairs is the gallows pair EVA k and EVA t. Rather than be seduced by their similarities, my suggestion here is to use statistics to try to tease their two behaviours apart. It may sound trivial, but how do EVA k and EVA t differ; and what do those differences tell us?

The raw numbers

Putting strikethrough gallows (e.g. EVA ckh) to one side for the moment, the raw k/t instance frequencies for my preferred three subcorpora are:

  • Herbal A: (k 3.83%) (t 3.28%)
  • Q13: (k 5.38%) (t 2.27%)
  • Q20: (k 5.19%) (t 2.76%)

Clearly, the ratio of k:t is much higher on Currier B pages than on Currier A pages. Even if we discount the super-common Currier B words qokey, qokeey, qokedy, qokeedy, qokaiin, a large disparity between k and t still remains:

  • Q13: (k 4.3%) (t 2.46%)
  • Q20: (k 4.58%) (t 2.89%)

In fact, this k:t ratio only approaches (rough) parity with the Herbal A k:t ratio if we first discount every single word beginning with qok- in Currier B:

  • Q13: (k 2.71%) (t 2.41%)
  • Q20: (k 3.57%) (t 2.86%)

So there seems to be a hint here that removing all the qok- words may move Currier B’s statistics a lot closer to Currier A’s statistics. Note that the raw qok/qot ratios are quite different in Herbal A and Q13/Q20 (qok is particularly strong in Q13), suggesting that “qok” in Herbal A has a ‘natural’ meaning and “qok” in Q13/Q20 has a different, far more emphasised (and possibly special) meaning, reflecting the high instance counts for qok- words in Currier B pages:

  • Herbal A: (qok 0.79%) (qot 0.68%)
  • Q13: (qok 3.04%) (qot 0.74%)
  • Q20: (qok 1.84%) (qot 0.70%)

Difference between ok/yk and ot/yt

If we put ckh, cth and all qok- words to one side, the numbers for ok/yk and ot/yt are also intriguing:

  • Herbal A: (ok 1.38%) (ot 1.31%) (yk 0.51%) (yt 0.48%)
  • Q13: (ok 1.07%) (ot 0.91%) (yk 0.17%) (yt 0.12%)
  • Q20: (ok 1.53%) (ot 1.47%) (yk 0.19%) (yt 0.14%)

What I find interesting here is that the ok:ot and yk:yt ratios are just about identical with the k:t ratios from Herbal A. Consequently, I suspect that whatever k and t are expressing in Currier A, they are – once you go past the qok-related stuff in Currier B – probably expressing the same thing in Currier B.

As always, there are many possible reasons why the k instance count and the t instance count should follow a single ratio: but I’m consciously trying not to get caught up in those kinds of details here. The fact that k-counts are consistently that little bit higher than t-counts in several different contexts is a good enough result to be starting from here.

Might something have been added here?

From the above, I can’t help but wonder whether EVA qok- words in Currier B pages might be part of a specific mechanism that was added to the basic Currier A system.

Specifically, I’m wondering whether EVA qok- might be the Currier B mechanism for signalling the start of a number or numeral? This isn’t a fully-formed theory yet, but I thought I’d float this idea regardless. Something to think about, certainly.

As a further speculation, might EVA qok- be the B addition for cardinal numbers (1, 2, 3, etc) and EVA qot- be the B addition for ordinal numbers (1st, 2nd, 3rd, etc)? It’s something I don’t remember seeing suggested anywhere. (Please correct me if I’m wrong!)

So: do I think there’s room for an interesting paper on EVA k/t? Yes I do!