Automating premium translation at GlobalDoc

A case study on integrating machine translation and translation risk prediction

Enterprise content teams requiring high-quality translation services have traditionally insisted that agencies translate from scratch - no post-editing from machine translation.

GlobalDoc has translated for teams at enterprises like IBM, Tenneco, Xerox, Toshiba and other high-profile clients for decades and shared their concerns about machine translation.

At the same time, the massive growth in content and languages generate constant motivation to use advances in technology, and to automate more.

GlobalDoc, Inc. was founded in 1993 to translate information for Fortune 500 marketing and communications teams. The founder-led company grew from Atlanta to Europe and Asia as it developed its own cloud-based translation management system, LangXpert® to automate ordering, translation process management, delivery and review for constantly evolving content types.

Challenges

As a technology-centric language service provider, GlobalDoc had followed innovations in machine translation over the decades, but could not confidently recommend a machine-driven translation solution to clients that could deliver both quality and immediate savings.

GlobalDoc needs to deliver perfect quality and be fair and transparent to both clients and translators as it introduces new technologies.

High-quality enterprise content

Off-the-shelf machine translation APIs that power consumer-focused translation apps do not work well on high-visibility and sensitive enterprise content like corporate communications, marketing materials, multimedia, web UI strings and technical manuals. Professional human translators have to spend significant effort post-editing the machine output to maintain the final quality within traditional translation workflows.

Effort estimation

GlobalDoc clients need to know in advance what the cost of translation will be, to optimize content and language coverage according to their budget. But machine translation quality can vary wildly. In order to share the efficiencies of machine translation with clients and translators predictably and transparently, the GlobalDoc operations team needs to accurately forecast how much machine translation will assist the translator’s work on each document.

Fragmentation

The total volume of orders varies across clients, business units within clients and project element types. But for integrating machine translation, proper setup - starting with know-how about machine translation providers, formatting, security, customization and quality control - is required for each client, project and language pair.

COVID-19

As the COVID-19 pandemic hit societies around the world in Q1 and Q2 2020, enterprises felt their businesses and their own operations severely disrupted. They faced pressure to quickly disseminate critical emergency information in many languages and to launch new businesses.

Meanwhile, many employees were suddenly working from home without access to corporate systems, and many translators were also struggling in their professional and personal lives.

“We were already aggressively looking for ways to offer more automation and cost efficiencies to our clients, and the pandemic accelerated the absolute necessity to succeed in doing this seemingly overnight.”

Michael Cooper, Founder and CEO, GlobalDoc

GlobalDoc was fortunate to be fully remote-capable and quickly shifted more resources to the development of working solutions that fulfilled these unprecedented requirements.


Read the full case study Integrating Machine Translation and Risk Prediction to Achieve Cost Savings (PDF) from GlobalDoc

A novel approach

Machine translation quality estimation is a topic of open research and inside technology companies like Amazon, Google, Facebook and Microsoft that have formidable machine learning research teams.

Quality estimation:
Automatic methods for estimating the quality of neural machine translation output at run-time, without relying on reference translations 1

Deep-learning approaches, based on massive multilingual language models, are gaining ground for instantly predicting both segment-level quality metrics, like post-editing effort, and aggregate metrics - document- or project-level evaluation. 2

GlobalDoc CEO Michael Cooper assessed the landscape and ranked ModelFront, with its translation risk prediction API and console, as the leading provider of production-strength solutions for quality estimation and evaluation.

Partnering with ModelFront

GlobalDoc and ModelFront partnered to integrate ModelFront technology into GlobalDoc’s translation management system, LangXpert, and share know-how on use cases and translation technology.

With GlobalDoc’s guidance, ModelFront tunes the accuracy for GlobalDoc use cases, develops support for the required project and document formats and integrates customizable machine translation from the most suitable providers.

The partners present the solution as an option to GlobalDoc clients, who are able to preview real results behind the scenes - actual machine translation and risk prediction output - on recent projects before they choose the post-editing option for future projects.


How it works

Let’s look at how it worked on a real project from Q3 2020 where the client, a Fortune 500 in the automotive space, selected the option for full human post-editing from machine translation.

10 instruction manuals as Adobe InDesign® documents (IDML format)

2595 segments (14181 words)
English (United States) to German (Germany)
1095 exact translation memory (TM) matches and 1500 new segments

GlobalDoc’s workflow parses the documents into segments - meaningful units of text like titles or sentences - and invokes ModelFront for custom machine translation and translation risk prediction on the 1500 new segments for which there was no exact translation memory match from previous projects. The machine translation has been customized based on the translation memory.

The distribution of the segment-level risks for this project is shown in the ModelFront console.

For this project, many of the segments have very low predicted risk - the estimated probability that the segment will need to be post-edited, even by one character.

The system factors in not just the quality of the machine translation, but also the inherent difficulty and quality of the source content. The aggregate score is length-weighted to account for actual post-editing effort.

The dashboard also has a preview where the segments are sorted by predicted risk and labeled by error type so that the project manager can drill down into the actual text in a targeted way.

The preview surfaces a high-risk segments in the ModelFront console.
Without context, the English word Spring is ambiguous - it could refer to the season, a jump, a hardware part or a product or brand.
Here it is machine-translated to the season, and the human translator will work to understand the full context and to correct it to Feder - a hardware part.

With all this information, the GlobalDoc project manager knows that the professional human translator will be able to review and approve many segments - a significant speedup, which is passed on to the client as savings.

35% savings

The upfront quoted cost to the client was reduced from $1697 to $1102 for additional savings of 35%.

GlobalDoc provides a quote to the client and to the translators that reflect the actual difficulty tier before the project began. After the project is delivered with the final post-edited translations, the post-editing data is used to evaluate the risk prediction system and continuously improve its accuracy for future projects.

This approach was successfully applied at GlobalDoc over Q3 and Q4 2020 to projects along the full spectrum of language pairs and content types, from film subtitles to a literary novel.

“The feedback from both our clients and our translators is consistently positive.”

David Jett, Vice President of Operations, GlobalDoc


Success factors

The success of integrating machine translation and risk prediction depends on multiple criteria.

High-quality content

GlobalDoc clients require excellent quality. Because they generally also provide their original documents in excellent quality and in a style consistent with their previous projects, customized machine translation performs better. It’s also important to segment documents in a way that’s optimal for machine translation.

Translation memories

GlobalDoc has maintained translation memories for clients over decades. Customization of machine translation mainly depends on the size and quality of the client’s translation memories.

A sensible, flexible approach to automation

Not all content is a good fit for post-editing from machine translation. It depends on the document type, the language pair and data available for customization. Different machine translation options have different strengths and weaknesses. The number, features and language and locale support of machine translation service providers is constantly expanding.3

From the project- to the word-level, cutting-edge technology can provide valuable inputs to human experts who make the final decisions and maintain the final quality of the traditional translation workflow.


Learn more

You can read the full case study Integrating Machine Translation and Risk Prediction to Achieve Cost Savings (PDF) from GlobalDoc

You can contact GlobalDoc for more information or to order full human post-editing for your premium content.

GlobalDoc and ModelFront are actively exploring more ways to automate and to apply risk prediction to more use cases.


References
  1. Quality Estimation Task, Fifth Conference on Machine Translation, EMNLP 2020

  2. Seven Machine Translation Trends in 2020, Maxim Khalilov, TAUS

  3. An overview of the features and limitations of the major machine translation APIs, ModelFront