As a lover of language and languages, I was intrigued but bothered by the opening lines of an article I read this week at The Hot Word (dictionary.com’s blog). “Back in the 1940s, mathematician Warren Weaver made an audacious suggestion: what if translation was not a feat of literary theory and linguistics, but one of cryptography?” The rest of the article indicates that Weaver was on the right track, as evidenced by both Google Translate, and the recent success of some cryptographers in decoding the Copiale Cipher.
I think computers are great tools, and it wouldn’t surprise me if eventually they could be programmed to understand and use human languages fairly well. But to do it by the tools of mathematics rather than linguistics? Besides, even humans often do a poor job of translation (Charles Berlitz gives some very amusing examples in his book Native Tongues) – how could a computer possibly do better?
I decided to check out Google Translate. I took a sentence from the article I had just been reading, and pasted it into Google Translate. It didn’t matter much which language I translated it into, since my aim was to re-translate it to English and see how this compared with the original. I chose Russian. The result was not perfect, but better than I expected.
Original sentence: By making a machine-readable version of the text, a team of computational linguistics were able to run the characters through a software program that found patterns in the text, which were otherwise inscrutable.
Russian translation: Делая машиночитаемой версии текста, команда компьютерной лингвистики смогли запустить персонажей через программное обеспечение, которое обнаружили закономерности в тексте, которые в противном случае неисповедимы.
Back to English: Making the machine-readable version of the text, Computational Linguistics team were able to run characters through software, which found a pattern in the text, which otherwise inscrutable.
By way of comparison, Babel Fish produced this when translating the same sentence to Russian and then back to English: “With way to make machine-readable the version from the text, the command of computational linguistics could break into a run natures to the program of software which it found the pictures in the text, which were otherwise inscrutable.” Yes, definitely inscrutable.
I had always assumed that Google Translate worked the same way any other translation program does, by using vocabulary lists and rules of grammar that had been programmed into the computer. Until a few years ago, that was how Google Translate worked, but then they came up with a new approach.
As best as I can tell from this article, Google Translate now depends on the fact that it can use, as examples of good translation, millions of documents that are available online, where the same text is provided in two or more languages. If a particular phrase that appears in many documents in one language is generally translated a certain way into another language, the program can record that phrases – in each language – as an appropriate translation of the other language. Any time a translation request in one language or the other uses that phrase, it is ready with the correct translation into the other language. And it doesn’t have to “know” what a single word means, or learn any rules of grammar.
As the article points out, that doesn’t fit with the way we think about language learning and translation at all. But that’s because we think in terms of how we learned foreign languages in school – lots of vocabulary lists, grammar rules to memorize, and practice saying and writing the words and phrases we were trying to learn. That’s how I learned both French and Spanish, plus some German and Esperanto.
That’s also how I tried to teach French and Spanish, because it was the only way I was familiar with. I read books advising me to use only the target language in the classroom, rather than explaining things in English, but I had enough trouble controlling a classroom full of unenthusiastic learners as it was. I could only imagine chaos resulting if I insisted on speaking only Spanish (or French, which I don’t know quite as well anyway), and the students got frustrated because they couldn’t understand me.
The method used by Google Translate, however is actually somewhat analogous to how young children learn language. They’re absorbing the many examples from hearing conversations, of course, not from digital documents, but the principle isn’t that different. The child hears a lot of sounds, and through much repetition connects a certain sequence of sounds with a certain object, action, or other identifiable meaning. Only later do children add to their vocabulary and learn to create more complex sentences by intentional learning of words and grammar rules.
The computer is finding patterns in documents in a similar way, though they correspond not to things in the “real world” but to other patterns in documents in another language. Just as a child will learn to speak only as well (in the sense of proper grammar) as those people he hears speak, the computer program using this approach will produce results only as good as the documents it has available for comparison. That’s why Google Translate does much better with some languages than others.
I recently introduced my son Al to the idea of using a computer to translate words and phrases written in a language he didn’t understand. Back when I was in Europe in 1983, I purchased some Tintin books in Paris so I could enjoy the stories in the original French. I also saw Smurfs books in every country I visited, and I was convinced that – contrary to the opinion of my fellow American students that it was just another example of the U.S. exporting lowbrow culture to the rest of the world – the Smurfs had originated somewhere in Europe. I just didn’t know which country.
As it turns out, Peyo was also Belgian (like Hergé, the creator of Tintin, and other creators of Belgian comics) and also wrote in French. Those little blue creatures were not called Smurfs, but Schtroumpfs. At the time, though, since the word did not sound at all French to me, I wondered if it might have been adapted from German. I had seen Smurf books in German, and the word was Schlümpfe. (In Spanish they were pitufos, and puffi in Italian – I was sure neither of those were the originals.)
Before I reached Germany, though, I spent an afternoon in Amsterdam (between arriving by train from Belgium and leaving by another train for Speyer, Germany). That gave me just enough time to take a bus ride around the city, visit an art museum, and buy a comic book I saw with the title Dat Moet Je Smurfen! I learned much later (when the internet made it possible to learn such things easily) that I was correct that Smurfs came to the U.S. by way of Dutch, but I was wrong in thinking that a French-speaking author would not come up with a ridiculous word like Schtroumpf. (Wikipedia tells how the word originated.)
I quickly discovered, in the train on the way to Speyer, that while Dutch certainly is similar to German, I could only understand a very few words in the book. (Upon arriving in Germany, I discovered that I had forgotten too much German since ninth grade for it to be of much use to me.) But as with most comic books, the pictures tell most of the story. I also purchased some popular comic books in Spain, and I intended to use all of these to demonstrate to my future students how much you can figure out about the words by using the context – in this case, pictures.
Some of those books eventually ended up on my younger son Al’s bookshelf. Apparently he was able to enjoy the Tintin books without knowing a word of French, but the Smurf book posed more of a challenge. Recently he started bringing it to me, while I was reading (in English), and ask me to translate. My Dutch is just as non-existent now as it was in 1983, so I explained as best as I could based on the pictures. After a while, though, it occurred to me that we could probably find a Dutch-English dictionary at the library to assist us.
To my surprise, the only one listed in the online catalog was in the reference section, which meant it could only be used in the library. Just in case there was a mistake in the catalog (far from a distant possibility), I located the shelves holding the foreign language dictionaries that could be checked out and taken home. No Dutch dictionary. But there was a German-English dictionary, and two Asterix books translated into German. (Al’s bookshelf also contains a used Asterix book I found somewhere, in the original French.) Al eagerly checked out the dictionary and the Asterix books.
I still remember a little German, and I had tried to give myself a refresher course back when my older son Zach was studying German in high school, but I quickly found that it wasn’t sufficient to get through a single panel of comics. Looking up words in the dictionary turned out to be a very tedious task, especially as I know the difficulties of translation well enough to know that it rarely works to translate word by word, and that the best word to use in a given context is often not the first one listed.
Tired of that, I went to the computer and brought up an online translation program (not either Google Translate or Babel Fish, but I don’t remember what one it was), and showed Al how to use it. It was easy enough for him – he just had to type in the German and get the English equivalent – or at least something sort of close to it. I don’t know how far he got in the books, but I know the German-English dictionary just sat unused until it was time to take it back to the library.
Of course, we tried using the same tool for the Smurf book. But translating a Smurf comic book is made more difficult by the frequent occurrence of the word smurf as a verb or common noun (i.e. referring something other than a Smurf). And it’s hard to figure out the other words by using context when some important words can’t be translated. But it got enough of the meaning across that Al stopped asking me for translations. He even enjoyed the comics enough that he started telling other people about the amusing things he had read in the Smurf books (as he often does with the Calvin and Hobbes books).
Now, though, I think it might be worth going back through the book again, using Google Translate. Its translations make a lot more sense than what we got before. Even if (or because?) they are done using mathematics instead of linguistics…