We often hear about the results of bad machine translations - translator post-editing effort, breakdown in communication, lost conversions…
But what about the causes of bad machine translations?
Neural machine translation has improved most aspects of quality, but other quality problems seem to be timeless, and a few quality problems are new.
Translation can be split into three different levels: the translation systems, the translation processes relying on them and natural human language itself.
Systemic errors are on the machine translation provider side, like bad data or lack of data, pre-processing or post-processing - anything that isn’t fixable without re-training the machine translation model or re-engineering the machine translation training and serving infrastructure.
Process errors are on the client side or in the integration or agreement between the client and the machine translation provider. They’re the most common, and luckily also the easiest to fix.
Natural human language is infinitely expressive. There are rules, and there are exceptions, and exceptions to the exceptions. Different languages and cultures express meaning and ideas differently, they are not 1:1. There are ambiguities which require intelligence and reasoning. Some errors can be solved with more data or more context, most can only be net reduced.
Language is constantly evolving. In most scenarios, we can’t control language, only build and maintain systems and processes to handle it better or fail gracefully.