Aleteia logoAleteia logoAleteia
Tuesday 28 November |
Saint of the Day: St. Catherine Labouré
Aleteia logo
Art & Culture
separateurCreated with Sketch.

Dartmouth uses 34 Bible versions to develop style translator algorithm



John Burger - published on 11/20/18

The next step for Google Translate?

Anyone who needs a quick translation from one language to another now has a few online assists, notably Google Translate. The computerized translation is not always beautiful, but in many cases it will get the sense across.

Researchers from Dartmouth College and Indiana University were working on a project to improve such computer-generated translations when they discovered a way of translating words within a single language—from one style to another. The Dartmouth-Indiana team developed a style-translation algorithm and used 34 English language versions of the Bible to train and test the system.

Style is that elusive quality—in writing, music, art or personal conduct—that distinguishes one person from another. It’s the quality that allows a person to easily recognize that the singer whose voice is now coming across the radio airwaves is, say Neil Diamond, or Bob Dylan, or Frank Sinatra. It’s what allows a visitor to an art gallery to realize he’s currently in the hall of the French Impressionists, rather than the Cubists.

In writing, style has a lot to do with the length of sentences, the choice of vocabulary to describe things or feelings, and the turn of a phrase.

The goal of Dartmouth-Indiana’s work was “to see if we could basically use the translation framework to do some form of style translation,” said Dartmouth math and computer science professor Dan Rockmore.

The researchers chose the Bible for the study because it is perhaps the most annotated and indexed literary text in existence, wrote Joel Shurkin at Inside Science:

It comprises 31,000 verses and enabled the construction of 1.5 million unique pairings of words from one version with words from other versions, which provided data needed for training the system. For instance, Genesis 1:1 in one Bible translation matches Gen. 1:1 in all the others. However, using the Bible as the texts—or corpus—with which to train and test the system may be a weakness in the study, said Shlomo Argamon, a forensic linguist and computer scientist at the Illinois Institute of Technology in Chicago. It has multiple styles, sometimes within one book. “You have the prophetic books, some of which are full of very high poetry, and stories about marrying prostitutes. You have Proverbs, which itself has multiple styles,” he said.

The texts were fed into two algorithms—a statistical machine translation system called “Moses” and a neural network framework commonly used in machine translation, “Seq2Seq,” Dartmouth explained in a press release.

Possible applications of these algorithms include rendering texts readable that normally are not, like a program that translates a legal document into lay language or makes a work of English literature understandable to someone who is learning English as a second language or is too young to understand many of the words, Shurkin suggested.

The research is published in the journal Royal Society Open Science.

Support Aleteia!

Enjoying your time on Aleteia?

Articles like these are sponsored free for every Catholic through the support of generous readers just like you.

Thanks to their partnership in our mission, we reach more than 20 million unique users per month!

Help us continue to bring the Gospel to people everywhere through uplifting and transformative Catholic news, stories, spirituality, and more.

Support Aleteia with a gift today!

Daily prayer
And today we celebrate...

Entrust your prayer intentions to our network of monasteries

Top 10
See More
Get Aleteia delivered to your inbox. Subscribe here.