Skip to content

Term of the Week: Bitext

What is it?

A collection (usually electronic) of texts in two languages that can be considered translations of each other and that are aligned at the sentence or paragraph level.

Why is it important?

A bitext is one of the most basic results of translation. It can be used in the language industry for training, revision, and quality control. Bitexts also serve as training data for statistical machine translation.

Why does a business professional need to know this?

In linguistics, a sentence is often considered as a natural unit. In translation, a translated sentence pair is, therefore, also a natural unit.

From a technical point, bitexts are a straightforward representation of the source text and the product of translation[bitext 1]. They can serve as an exchange or interface format between localization experts, system developers, and machines. Bitexts play a key role in training, evaluating, and improving localization technologies, such as translation memories, terminology management tools, or machine translation engines. They can also serve as a basic format for proofreading and interaction with customers, e.g., in the process of formal quality control. XLIFF is a standard format for representing bitexts in localization processes.

If bitexts are used for training language technology applications, they must provide the application with all information necessary for their intended functionality. To do this, they need to have optimal quality, represent a sensible range of linguistic variation, and have a large enough vocabulary. In general, it is best to use bitexts based on literal, uncreative translations when setting up translation engines.

Bitexts usually present (complete) ordered texts that are normally aligned at the sentence or paragraph level[bitext 2]. This makes it possible to study the meaning of larger linguistic texts, also known as discourse, such as how texts organize information, are coherent, and reference topics both inside and outside of the current text. Such analyses can be used to improve the quality of the translation memory and, in the case of machine translation, to train the system.

References

About Aljoscha Burchardt

Photo of Aljoscha Burchardt

Aljoscha Burchardt is lab manager at the Language Technology Lab of the German Research Center for Artificial Intelligence (DFKI GmbH). He is an expert in artificial intelligence and language technology. His interests include the evaluation of (machine) translation quality and the inclusion of language professionals in the MT R&D workflow. Burchardt is co-developer of the MQM framework for measuring translation quality. He has a background in semantic language technology.

Term: Bitext

Email: aljoscha.burchardt@dfki.de

Website: dfki.de/~aburch/

Twitter: @albu

LinkedIn: linkedin.com/in/aljoschaburchardt/

Facebook: facebook.com/aljoscha.burchardt