Evaluation documentation

Reference documentation for evaluations in the ModelFront console

Evaluation is an easy way to get risk predictions for a dataset right in the ModelFront console.

A good evaluation helps you assess the quality of translations, for example to estimate post-editing effort, compare multiple translation APIs or custom models. You can also use evaluation to filter or check the quality of human translations, translation memories or parallel corpora.

It's a few just a few clicks, and doesn't require writing any code. Just like the API, an evaluation returns segment-level scores and can include translations if requested.

A single evaluation is for a single language pair, engine and model. To compare languages, engines or models on the same original input segments, just create and run an evaluation for each combination.

Your evaluation is yours and private by default - you can only share it by downloading the file and sharing it.

Creating an evaluation

Name and note

You can always edit Name and Note while an evaluation is running or after it finishes.

Source and target language

The source and target language must not be the same, unless other (und) is selected.


By default, an evaluation requires a parallel data file.

If you only have source text, you can just select an engine to have segments translated by one of the major machine translation APIs like Google, Microsoft, DeepL or ModernMT.

Data file

The data file can contain parallel data - pairs of sentences or other segments in the source language and target language, similar to machine translation training data.

Or, if you don't have translations, you can just upload a monolingual data and have translations from one of the external engines filled in.

All files should be UTF-8 encoded.

Parallel formats


TMX is an open and standard XML format for importing and exporting translation memories. Segments can contain tags and linebreaks and other control characters.

Only the selected language pair will be extracted. If the file includes multiple variants for that language pair, translations for all variants will be extracted.


A tab-separated-values (TSV) file is a plain-text file where columns are separated by the tab character. Applications like Excel can export a spreadsheet to a TSV.

The .tsv file must have exactly 2 columns. The control characters tab (\t), newline (\n) and carriage return (\r) can be included by escaping them with a preceding \. The literal character \ should also be escaped with \. Therefore the literals \t, \n and \r should be double escaped.

Monolingual formats

.txt, .text, .md, .markdown, .adoc, .se, .html, .xhtml, .align, .src, .trg, .srt

The monolingual file format option is only for evaluations requesting machine translation. It should only include the original segments, and the machine translations will be filled in with the engine you selected.

The control characters tab (\t), newline (\n) and carriage return (\r) can be included by escaping them with a preceding \. Therefore the literals \t, \n and \r should be double escaped.

File size

Evaluation supports very large files - there is no technical limit. You can evaluate files larger than 1GB with the Google Cloud Storage address option.

Depending on the segment length and our current load across all clients, it takes about 1 hour per million segments. Evaluations that include a request for machine translations take significantly longer, due to the latency of the external translation APIs.


By default, evaluations use our latest default generic model. You can also select a custom model from those that are available to your account.


When an evaluation is finished, ModelFront will send you a notification email, and you'll be able to preview, share and download a spreadsheet file with the full results.


The quality score is similar to human evaluation or BLEU score - an aggregate score for the whole set, 0 to 100, where higher is better. It's only meaningfully for evaluations that are large and diverse enough to represent a statistically significant sample.

A ModelFront quality score is just the opposite of the average of the risks, weighted by length of the original source text. Length-weighting makes the score better reflect actual quality and post-editing effort.

So if the average risk is 10%, the score will be roughly 90.


The chart is a histogram showing the distribution of translations by risk. High quality translations are clustered along the left.

If there is a peak of risky translations on the right, that's a sign that there is a significant cluster of bad translations.

The chart can help you understand the effect of where you set a cutoff. How many translations will you keep? What final quality will you get?


The preview shows the riskiest translations, with the risk score and labels. You can toggle a label filter on to filter out rows by a label.


The full results are available as a .tmx file or as .tsv file. The .tsv file has an additional third column with the predicted risk.

Small and medium datasets can be filtered right in the console before downloading.

For working with very large dataset, we recommend downloading as .tsv file and provide guidance on common operations.


The download data file is encoded and escaped the same as an upload data file in TSV format. You may want to unescape control characters when converting it to another format.

If you open a .tsv file in a spreadsheet application Microsoft Excel, Apple Numbers or Google Spreadsheets, make sure to change Text Qualifier from " to None, in case some of your segments contain ".

If you open a .tsv file in Microsoft Excel or in a Windows application, make sure to select UTF-8 as the file encoding.

You can also work with it on the command line, which is recommended for larger files.


To sort by risk in Bash:

sort -t$'\t' -k3 -n <file.tsv> 

To reverse sort, add -r.


To filter while preserving the order in Bash, for example to get only those with risk below 50%:

awk -F "\t" '{ if($3 < 50) { print }}' <file.tsv>

To just peak at the top or bottom, add | head -n 100 or | tail -n 100.

To count the lines, add | wc -l. To write it out to a file, > <newfile.tsv>.


To drop the third column with risk scores and keep only the filtered parallel data in Bash, use cut to get the first two columns:

cut -f1 -f2 <file.tsv>


To join multiple eval files with corresponding rows in Bash, use paste.