New computer software and crowdsourcing may make centuries worth of handwritten Latin documents available online.
In reality, the SVA is simply the private (from the Latin word “secretum”) archives of the pope. In fact, since Pope Leo XIII opened the archives up to researchers in 1881, they haven’t even been private. Once Vatican documents are 75 years old, scholars are free to peruse them to their hearts’ content.
In theory, starting from the 8th century, everything from historical documents, acts promulgated by the Vatican, account books, and correspondence of the popes is available to researchers.
The only problem: the sheer volume of the archives makes them virtually inaccessible.
According to an article by Sam Kean in The Atlantic, of the 53 linear miles of shelving in the Vatican Secret Archives, only “a few millimeters’ worth” of pages have been scanned, transcribed, and made available for computer searches online.
Enter In Codice Ratio, a research project that is using artificial intelligence and optical-character recognition (OCR) to automatically transcribe the contents on the Vatican archives.
As Kean points out in his article for The Atlantic, OCR works great on typeset documents, but it can’t handle handwritten text. The letters tend to run together and are not always “nice, clean examples” of the letters they are supposed to represent.
Here’s where artificial intelligence comes in. Researchers recruited Italian high school students without any knowledge of Medieval Latin to help them. Presented with examples of letters that the OCR software identified, the students would see if those letters were correct matches. All the students had to do was match visual patterns. The software noted the corrections the high school students made, and learned from its mistakes.
When they first began the project “the idea of involving high-school students was considered foolish,” Paolo Merialdo, a scientist behind Codice Ratio, told Kean. “But now the machine is learning thanks to their efforts. I like that a small and simple contribution by many people can indeed contribute to the solution of a complex problem.”
Transcribing these ancient written documents by computer was hardly smooth sailing from there on out, and results have been mixed. One-third of the words contained typos, which makes for an annoying reading experience, but is still seen as a great advance.
“Imperfect transcriptions can provide enough information and context about the manuscript at hand” to be useful, Merialdo told Kean.
What’s more, scientists behind the project expect the software to improve with time since the more artificial intelligence learns, the better it gets.
Read the entire article from The Atlantic here.
If you’re reading this article, it’s thanks to the generosity of people like you, who have made Aleteia possible.
Here are some numbers:
- 20 million users around the world read Aleteia.org every month
- Aleteia is published every day in eight languages: English, French, Arabic, Italian, Spanish, Portuguese, Polish, and Slovenian
- Each month, readers view more than 50 million pages
- Nearly 4 million people follow Aleteia on social media
- Each month, we publish 2,450 articles and around 40 videos
- We have 60 full time staff and approximately 400 collaborators (writers, translators, photographers, etc.)
As you can imagine, these numbers represent a lot of work. We need you.
Support Aleteia with as little as $1. It only takes a minute. Thank you!