The progress in machine learning is amazing. With enough data, machines can now beat humans at chess, image labelling, and even driving. Machine translation improves every year.
But human language is complex. Machine translation fails often and unpredictably. The results can be nonsense, misleading, offensive or even dangerous.
The basic idea of risk prediction is simple: we want to guess whether a translation is right or wrong.
With the ModelFront translation risk prediction API, client applications can decide how to react to risk for each translation, in a way that is right for their users and their speed, scale and quality goals.
Traditional quality evaluation, like BLEU and METEOR, is for the evaluation of a machine translation system, at the corpus level. It requires human-quality golden translation to already exist, so it can't be used for incoming new content, or for real-time predictions on a stream of translations.
Because risk prediction can always easily be run on a large dataset, it also effectively provides quality evaluation - even for a dataset with no human reference translations - and parallel corpus filtering. ModelFront now also offers quality evaluation for machine translation researchers and developers.
Risk prediction is related to quality estimation, as it's known in the research world. It's usually at the sentence level. Many internal production systems are based on feature engineering, the state-of-the-art approaches still learn a separate model for each pair, which makes it very hard to support hundreds of languages and tens of thousands of pairs.
The greatest problem with traditional quality estimation is that it is focused on HTER - post-editing effort - which has the same problem as BLEU: it doesn't separate stylistic preferences from painfully catastrophic translations.
There's no silver bullet. We've built on open research and advances in deep learning to productionize state-of-the-art approaches and make them accessible and useful to more players.
Translators, quality managers, machine translation clients and tools developers are using our translation risk prediction in creative ways. The results are so scalable and cost effective that they open up whole new use cases.
It's no secret that human translation delivers much higher quality. On the other hand, machine translation is about 500 times less costly than human translation, and is also instant.
With translation quality risk prediction, you can instantly use the machine translations that are good, and send only the risky translations to human translators. Alternatively, you can priority sort the translations by risk, so that humans translators work on the riskiest translations first.
Now you can get 99% quality at 10% of the price.
For example, if there are 100 reviews of a product, and 90 of them are translated well, we can just show those 90 to the user, and drop the 10 risky ones, or put them on the last page.
Simply measuring, graphing and monitoring aggregate quality is a big step forward. How many risky translation were served today? Is the quality better for descriptions or for reviews? How does it vary by domain, or by language pair?
By setting up alerts, you can also be sure to know if the translation risk for any slices changes.
Compare translations from multiple APIs to choose the best
If you have trained a custom engine like AutoML for Google Translate, you can compare the translation from your custom models with those of the default API, and choose the best one for each sentence. You can also look at aggregate quality to train better custom models.
If there are multiple versions of the original input text, for example with variations in spelling or casing, you can translate both and use the higher quality translation.
If the user knows multiple languages, you can even compare the translations into different languages, and show the best one.
Add confidence to reading and composing with machine translation in business workflows. For example, when a team member is replying to a user or customer review or email with a machine translated text, the translation can be checked for risks before it is sent.
After working at Google Translate, using translation APIs, localizing products we built and watching friends and family work as human translators, we knew that there are many translations that require human intelligence.
But we also knew that there are many good machine translations and too many machine translation errors that are detectable and preventable with the right implentation of today's technology.
ModelFront takes a fundamentally different approach to translation risk prediction, using deep learning and very large datasets. ModelFront is opening up many new use cases for balancing machine scale and human quality.
The ModelFront team is led by full-stack engineers with experience building highly-scaled data-centric APIs, marketplaces and SaaS platforms like Google Translate, Google Play, PubNative, Aarki and early-stage startups. We speak Spanish, German, Alemannic, Italian, French, Russian, Serbian, Croatian, Armenian and English.
ModelFront Inc is a founder-owned Delaware-registered C-Corp. We're very thankful to Google Cloud for Startups and Microsoft for Startups for their generous support.
You can catch our team at Empirical Methods in Natural Language Processing (EMNLP), Association for Computation Linguistiscs (ACL), Workshop on Machine Translation (WMT) and Applied Machine Learning Days (AMLD), Association for Machine Translation in the Americas (AMTA), European Association for Machine Translation (EAMT), LocWorld and TAUS.
For technical support please email [email protected].
Interested in translation risk prediction?