We hear more and more claims about machine translation at "human parity" and "human quality". But these controversial claims apply only to the very top language pairs, like Spanish to English.
The reality remains that the quality for almost all language pairs, like German to French, Chinese to Arabic or even Spanish to Portuguese, is much worse. That's primarily because most machine translation systems are still translating via English.
Even human translation via a third language is bad. Machine translation via a third language is a quality disaster. It also means that you cannot use customization like Google AutoML or Microsoft Custom Translator.
The English hack is not exactly a secret, but it's fun to prove it to ourselves.
Can you guess the ambiguous English words that caused the bad translations? Hints: grave, bat, you...
Can you find other translations that expose this hack?
The English hack affects almost all translation pairs and all major engines today - Google Translate, Microsoft Translator and even DeepL.
|de||en → de||es → de||fr → de||... → de||ru → de||sv → de||zh → de|
|en||de → en||es → en||fr → en||... → en||ru → en||sv → en||zh → en|
|es||de → es||en → es||fr → es||... → es||ru → es||sv → es||zh → es|
|fr||de → fr||en → fr||es → fr||... → fr||ru → fr||sv → fr||zh → fr|
|...||de → ...||en → ...||es → ...||fr → ...||... → ...||ru → ...||sv → ...||zh → ...|
|ru||de → ru||en → ru||es → ru||fr → ru||... → ru||sv → ru||zh → ru|
|sv||de → sv||en → sv||es → sv||fr → sv||... → sv||ru → sv||zh → sv|
|zh-cn||de → zh-cn||en → zh-cn||es → zh-cn||fr → zh-cn||... → zh-cn||ru → zh-cn||sv → zh-cn||zh → zh-cn|
|zh-tw||de → zh-tw||en → zh-tw||es → zh-tw||fr → zh-tw||... → zh-tw||ru → zh-tw||sv → zh-tw||zh → zh-tw|
There are a few exceptions. For example, there are also direct systems between very similar languages like Serbian, Croatian and Bosnian, and between variants like simplified Chinese and traditional Chinese. And these days, anyone with data and machines can take seq2seq and train a model for pair.
Supporting all direct pairs between 100 pairs would require training, launching and maintaining 100 x 100 pairs. That's about 10,000 in total!
By using a pivot language or bridge language - almost always English - the number of systems can be reduced to about 200 - one for each direction for each language.
There is simply not a lot of parallel data for many pairs, compared to the data available for translation to and from English. It makes it hard to train a model, and also hard to evaluate.
Crawling, alignment, training, evaluation, deployment and maintenance for 10,000 pairs in production would create massive engineering costs. The amount of traffic for many obscure pairs simply does not justify the cost and effort.
Neural machine translation does not increase quality as much as it simplifies engineering. By lowering the barrier to entry, neural offers the potential to train and launch many more direct pairs. In fact, Google researchers are already experimenting with a universal model that handles all language pairs.
The demand for direct translation between non-English languages is outside English-speaking countries, so it will not be surprising if DeepL, Reverso, Yandex, Tencent or Baidu is the first to market with open-domain direct translation for major pairs like German-French or Spanish-Chinese.
ModelFront risk prediction is built to support direct and indirect pairs. We see significantly better quality for direct pairs.
Not sure whether to invest in training a direct pair? Talk to us about your language pairs, content types and quality goals, and we will advise you on your options.