What causes bad translations?

Machine translation error causes at the system, process and language levels

We often hear about the results of bad machine translations - translator post-editing effort, breakdown in communication, lost conversions…

But what about the causes of bad machine translations?

Neural machine translation has improved most aspects of quality, but other quality problems seem to be timeless, and a few quality problems are new.

Translation can be split into three different levels: the translation systems, the translation processes relying on them and natural human language itself.


Systemic errors are on the machine translation provider side, like bad data or lack of data, pre-processing or post-processing - anything that isn’t fixable without re-training the machine translation model or re-engineering the machine translation training and serving infrastructure.

Bad training data

Bad noisification

Bad pre-processing

Bad post-processing


Process errors are on the client side or in the integration or agreement between the client and the machine translation provider. They’re the most common, and luckily also the easiest to fix.

Bad sentence segmentation

Wrong language

Wrong script






Natural human language is infinitely expressive. There are rules, and there are exceptions, and exceptions to the exceptions. Different languages and cultures express meaning and ideas differently, they are not 1:1. There are ambiguities which require intelligence and reasoning. Some errors can be solved with more data or more context, most can only be net reduced.


Lexical ambiguity

Syntactic ambiguity

Long-distance dependencies


Style preferences




Language is constantly evolving. In most scenarios, we can’t control language, only build and maintain systems and processes to handle it better or fail gracefully.