The next step for Google Translate?
Anyone who needs a quick translation from one language to another now has a few online assists, notably Google Translate. The computerized translation is not always beautiful, but in many cases it will get the sense across.
Researchers from Dartmouth College and Indiana University were working on a project to improve such computer-generated translations when they discovered a way of translating words within a single language—from one style to another. The Dartmouth-Indiana team developed a style-translation algorithm and used 34 English language versions of the Bible to train and test the system.
Style is that elusive quality—in writing, music, art or personal conduct—that distinguishes one person from another. It’s the quality that allows a person to easily recognize that the singer whose voice is now coming across the radio airwaves is, say Neil Diamond, or Bob Dylan, or Frank Sinatra. It’s what allows a visitor to an art gallery to realize he’s currently in the hall of the French Impressionists, rather than the Cubists.
In writing, style has a lot to do with the length of sentences, the choice of vocabulary to describe things or feelings, and the turn of a phrase.
The goal of Dartmouth-Indiana’s work was “to see if we could basically use the translation framework to do some form of style translation,” said Dartmouth math and computer science professor Dan Rockmore.
The researchers chose the Bible for the study because it is perhaps the most annotated and indexed literary text in existence, wrote Joel Shurkin at Inside Science:
It comprises 31,000 verses and enabled the construction of 1.5 million unique pairings of words from one version with words from other versions, which provided data needed for training the system. For instance, Genesis 1:1 in one Bible translation matches Gen. 1:1 in all the others. However, using the Bible as the texts—or corpus—with which to train and test the system may be a weakness in the study, said Shlomo Argamon, a forensic linguist and computer scientist at the Illinois Institute of Technology in Chicago. It has multiple styles, sometimes within one book. “You have the prophetic books, some of which are full of very high poetry, and stories about marrying prostitutes. You have Proverbs, which itself has multiple styles,” he said.
The texts were fed into two algorithms—a statistical machine translation system called “Moses” and a neural network framework commonly used in machine translation, “Seq2Seq,” Dartmouth explained in a press release.
Possible applications of these algorithms include rendering texts readable that normally are not, like a program that translates a legal document into lay language or makes a work of English literature understandable to someone who is learning English as a second language or is too young to understand many of the words, Shurkin suggested.
The research is published in the journal Royal Society Open Science.