Measuring quality and risk are fundamental to successful translation at scale. Both human and machine translation benefit from sentence-level and corpus-level metrics.
Metrics like BLEU are based on string distance to human reference translations and cannot be used for new incoming translations, nor for the human reference translations themselves.
What are the options if you want to build or buy services, tools or technology for measuring the quality and risk of new translations?
Whether just an internal human evaluation in a spreadsheet, user-reported quality ratings, an analysis of translator post-editing productivity and effort, or full post-editing, professional human linguists and translators are the gold standard.
There is significant research on human evaluation methods, and quality frameworks like MQM-DQF and even quality management platforms like TAUS DQF and ContentQuo for standardizing and managing human evaluations, as well as translators and language service providers offering quality reviews or continuous human labelling.
Translation tools like Memsource, Smartling and GlobalLink have features for automatically measuring quality bundled in their platforms. Memsource's feature is based on machine learning.
Xbench, Verifika and LexiQA directly apply exhaustive, hand-crafted linguistic rules, configurations and translation memories to catch common translation errors, especially human translation errors.
They are integrated into existing tools, and their outputs are predictable and interpretable. LexiQA is unique in its partnerships with web-based translation tools and its API.
ModelFront partners like GlobalDoc's LangXpert and translate5 integrate ModelFront technology as smart features in their translation systems.
If you have the data and the machine learning team and want to build your own system based on machine learning, there is a growing set of open-source options.
The most notable quality estimation frameworks are OpenKiwi from Unbabel and DeepQuest from the research group led by Lucía Specia. Zipporah from Hainan Xu and Philipp Koehn is the best-known library for parallel data filtering.
The owners of those repositories are also key contributors to and co-organizers of the WMT shared tasks on Quality Estimation and Parallel Corpus Filtering.
Massively multilingual libraries and pretrained models like LASER are a surprisingly effective unsupervised approach to parallel data filtering when combined with other techniques like language identification, regexes and round-trip translation.
Unbabel, eBay, Microsoft, Amazon, Facebook and others invest in in-house quality estimation research and development for their own use, mainly for the content that flows through their platforms at scale.
The main goal is to use raw machine translation for as much as possible, whether in efficient hybrid translation workflows for localization or customer service, or just to limit catastrophes on user- and business-generated content that is machine translated by default.
Their approaches are based on machine learning.
ModelFront is the first and only API for translation risk prediction based on machine learning. With a few clicks or a few lines of code, you can access a production-strength system.
Our approach is developed fully in-house, extending ideas from the leading researchers in quality estimation and parallel data filtering, and from our own experience inside the leading machine translation provider.
We've productionized it and made it accessible and useful to more players - enterprise localization teams, language service providers, platform and tool developers and machine translation researchers.
We built in security, scalability, and support for 100+ languages and 10K+ language pairs, locales, encodings, formatting, tags and file formats, integrations with the top machine translation API providers and automated customization.
We continuously invest in curated parallel datasets and manually-labeled datasets and track emerging risks types as translation technology, use cases and languages evolve.