Skip to content

Term of the Week: Term Extraction

What is it?

The analysis of a given text or corpus, with the goal of identifying relevant term candidates within their context. Also called term mining or term harvesting.

Why is it important?

Term extraction is the starting point of all terminology management tasks. Term extraction is usually followed by the elimination of inconsistencies. Well-managed terminology improves quality, reduces costs, and improves time to market.

Why does a technical communicator need to know this?

When you extract terms, you are not only working on terminology, you are also managing the organization-specific or industry-specific knowledge. Terminology promotes knowledge sharing between people working in the same business field.

If you aim to improve the quality and consistency of your publications, term extraction is probably the best approach. When you start term extraction, you might find various synonyms and spelling variants for the same thing. For example, you might discover the terms electronic catalog, E-catalog, and eCatalog used as synonyms. Once you have identified synonyms and variants, you can determine which version of these terms should be used in all publications across all functional areas[TermCoord].

To start a term extraction task, compile a corpus from which you can extract term candidates. These term candidates are then validated and automatically or semi-automatically recorded. Usually, term extraction is either monolingual, to extract term candidates, or bilingual, to identify term candidates together with their equivalents in the target language.

Several tools exist that can help you to automate term extraction. Each tool has strengths and weaknesses, so there is no one-size-fits-all solution. Before you decide on a term extraction tool, test and evaluate the various tools[Terminorgs].

In general, these tools use three main approaches:

  • Linguistic: the tool searches the corpus for word combinations that match a certain morphological or syntactical pattern, for example adjective+noun.
  • Statistical: the tool identifies repeated sequences of lexical items.
  • Hybrid: a combination of the previous two approaches, and thus, also the most frequently used approach.

References

  • [TermCoord] TermCoord: The EU’s website discussing terminology harmonization efforts and providing resources.
  • [Terminorgs] Terminology for Large Organizations: Terminology for Large Organizations is a consortium of terminologists who promote terminology management as an essential part of corporate identity, content development, content management, and global communications in large organizations.

About Stephanie Piehl

Photo of Stephanie Piehl

With Stephanie Piehl's passion for languages and cultures, it was only natural that she graduated in Applied Linguistic and Cultural Studies at the renowned FTSK in Germersheim, Germany. She brings over 10 years of experience in localization. Working as a localization coordinator on both the vendor and the client side, as a freelance translator, and now as an in-house terminologist at Agilent Technologies, Stephanie has gained insights of the industry from every perspective.

Term: Term Extraction

Email: stephanie.piehl7@agilent.com

LinkedIn: linkedin.com/in/stephanie-piehl-5a7a7730/