What is it?
Phrase, sentence, paragraph, or sentence-like piece of text that represents a cognitive unit and is used when searching for a match in a translation memory (TM) database.
Why is it important?
Discrete segments of text show up repeatedly across various pieces of text. Matching source segments in the TM for which previously approved translations exist increases efficiency in the translation process by providing the relevant translation to the translator.
Why does a technical communicator need to know this?
Segments are pieces of text that will be translated as a cognitive unit during a translation workflow; they are typically phrases, sentences, or paragraphs, rather than individual words.
A translation memory (TM) leverages previous translations to avoid duplicating work. Textual content is divided into meaningful segments because it is more likely that a translation will already exist for a smaller segment than for an entire content set. As the TM grows, the number of potentially matching segments increases, reducing translation costs.
To prepare for translation, the software parses new source content into segments based on markup (e.g., XML tags) or a combination of punctuation and white space. Segmentation rules include the punctuation combinations that the software applies during the parsing process.
The translation software then uses the new source segments to search the TM for matches. The translator receives the translated equivalents for editing. For best results, the granularity of the new source segments should be equivalent to the segments in the TM.
The idea of a meaningful segment is important because a segment in one language could have multiple possible translations in another language. Computers are not reliable at discerning meaning in written content. Therefore, the comparison of source segments to segments in the TM typically relies on pattern matching.
However, if three contiguous segments in the source content are 100% matches to three contiguous segments in the TM, the middle segment of the TM is considered an in-context match for meaning and the correct variant, even if other variants exist in the TM[SRX 2.0].
References
- [Wikipedia segmentation] Text Segmentation: Wikipedia article that describes how segmentation works.
- [SRX 2.0] SRX 2.0 Standard (GALA): Segmentation Rules eXchange (SRX) specifications