Skip to content

Glossary of Linguistics Terms

Effective localization depends on linguistic competence in multiple languages. The terms in this glossary are provided to assist non-linguists in understanding concepts related to the language aspects of localization and translation.

The definitions of the alphabetic, logographic, segmental, and syllabic writing systems were adapted from the work of Erik Vogt, who wrote the essay for the term Script.

About the author: Madison Van Doren

Photo of Madison Van Doren

Madison Van Doren is a recent graduate of Colorado State University with a Bachelor’s degree in English with a concentration in Language and a minor in Linguistics. She is pursuing an MA in Linguistics at Queen Mary University of London with research interests in historical and sociolinguistics.

Terms: Monolingual, Linguistics Terms

Email: madisonvandoren@gmail.com

LinkedIn: linkedin.com/in/madisonvandoren

Linguistics Terms


accent mark

Writing conventions used to differentiate similar letters in a language to indicate linguistic clues, such as a change in pronunciation or verb conjugation. For example, in French, an accent mark marks a verb in the past participle such that coupé is the past participle and coupe is the first-person, singular, present tense. Similarly, an accent mark can identify a change in phonemes, such as the cedilla on Ça in French, which indicates that the “c” is pronounced with an /s/ sound instead of a /k/ sound.


alphabetic writing system

The writing system for a language in which there is a symbol for each consonant and vowel. Alphabets are also known as segmental systems. These systems combine a limited list of characters to represent spoken language. There is not a direct correlation, but letters are intended to indicate individual phonemes. This writing system is common in European languages; however, it can also be seen in other regions, such as Korea.


case

Grammatical case refers to the role a word plays in the structure of a sentence, such as the subject of the clause or object of the verb. Some languages change the form of a word (inflect) to indicate case, while other languages rely on word order. It is important to understand the difference between part of speech (nouns, verbs, etc.) and case (subject, object, etc.) because these are the foundation of syntax. Understanding the meaning of sentences and larger texts depends on understanding case.


character

A general term for a symbol in a writing system. In the alphabetic writing system used for English, a character is a letter, such as a or s. In a syllabic writing system, such as Japanese hiragana, a character is one symbol that represents a consonant vowel pair. In a logographic writing system, such as Chinese hànzì, a character is a logogram representing a morpheme or a word.


corpus

A collection of spoken or written linguistic texts used to observe greater patterns in language. Uses include generating lists of words that are commonly used together or identifying vocabulary popular in a specific genre.


Cyrillic script

The characters used by most Slavic languages, including Russian, Bulgarian, and Serbian.


diacritical

The linguistic term for an accent mark. These marks indicate changes in pronunciation of a phoneme. The International Phonetic Alphabet (IPA) uses a standardized system of diacritical marks. The term diacritical is commonly used in academic literature.


gender

Grammatical gender differs from natural gender. Unlike English, which does not use grammatical gender, some languages assign nouns an arbitrary gender to categorize them. This affects how modifiers such as adjectives and articles are conjugated. This is common in Romance languages. Natural gender, meaning the sex of individuals, sometimes, but not always, matches the grammatical gender assigned to words. For example, the word for girl in Italian, la ragazza, is feminine in both grammatical and natural gender.


hànzì

The writing system of Chinese. The Chinese writing system is logographic, meaning that each character represents a whole morpheme or word. The hànzì characters have been borrowed by other Asian languages, including Japanese and Korean.


kanji

The Japanese writing system uses some characters borrowed from Chinese hànzì. These characters are used in combination with the other Japanese scripts such as katakana and hiragana. Although these characters are borrowed from Chinese, they don’t always carry the same meanings in Japanese as they do in Chinese.


language family

Historical relationships between modern languages. Languages with common ancestry are grouped in linguistics as language families. This includes Romance languages, such as French and Spanish, which have Latin as a common ancestor, or Germanic languages such as Dutch, English, and German, which are descended from Proto-Germanic. Knowing the classifications of languages helps to explain how they relate to each other.


language

A communication system used by a group of people that assigns agreed upon meaning to arbitrary collections of sounds and symbols. To be a language, the system must be capable of communicating abstract concepts, such as emotions, and being used reflexively to talk about the language. These constraints are important for understanding how one person cannot invented a language alone because language depends on a communication exchange. Animal communication is also not considered language because abstract thought is limited.


Latin script

The characters common to most European languages, based on the Roman writing system used for Latin. It is descended from the Phoenician alphabet.


linguistics

The scientific study of language and its functions. Linguistics encompasses all aspects of human language from its history to the social implications. Research in linguistics provides modern information on language around the world and how best to understand complex topics, such as syntax or phonetics.


linguistic lead

The localization team member who is responsible for verifying the quality of the translation for a particular language. This person is typically more senior than the translators, has strong editing skills and domain knowledge, and has a clear picture of the translation goals and requirements. Linguistic leads provide guidance, answer questions, and help the project manager keep things on track.


logographic writing system

The oldest type of writing system, logographic writing systems use symbols that represent a complete word or morpheme. Chinese is an excellent example of a logographic script, but most languages also include logograms, such as numbers and the ampersand. Logographic characters don’t indicate pronunciation. Therefore, multiple languages can use the same morphemes with different pronunciation. For example, Chinese, Japanese, and Korean share a large number of characters, but the pronunciation of most of these shared characters is different in each language.


matching penalty

Part of the algorithm used to determine whether or not a source segment is the same as one that already exists in the translation memory. To calculate the matching penalty, an algorithm can consider factors such as contextual cues, fuzzy matches, and whether a pivot language was used.


morpheme

The smallest meaningful unit in a language. Morphemes can be thought of as the building blocks of meaning and can be a standalone word or an affix – a prefix or suffix – that carries lexical or grammatical meaning. For example, the word cat is a morpheme in English because it has lexical meaning on its own and cannot be broken down into smaller pieces. The word cats has two morphemes, the lexical morpheme cat and the suffix s, which is a grammatical morpheme meaning plural.


neural machine translation (NMT)

An approach that uses an artificial neural network to determine matching and context. NMT systems progressively improve – and all parts of the system learn jointly – as they are exposed to more examples and content. They use a fraction of the memory that traditional statistical methods use. Google and Microsoft have adopted neural MT as their preferred methodology for machine translation. The Harvard NLP (Natural Language Processing) group has also developed an open source system called OpenNMT.


pivot language

An intermediary (and usually more common) language used to facilitate translation between two or more other languages. The pivot language (also known as a bridge language) is often, though not always, English. This is because English is the most common second language and the de facto language of global business, which means that a larger pool of trained translators know the combination of English and another language than other language combinations. For example, more translators know both English and Farsi than know both Farsi and Cherokee.


phoneme

Any sound that is used in a meaningful way by a specific language. The International Phonetic Alphabet lists all possible phonemes in human language and all humans are capable of making all phonemes. However, each language has a set number of phonemes. Therefore, people studying a new language can struggle to produce sounds that are not phonemic in their first language. For example, /i/ is the vowel in the English word beet and /ɪ/ is the vowel in bit. Both of these vowels are phonemes in English because they make a meaningful difference in a word. However, in Spanish, these sounds are not both phonemic because replacing one with the other does not change meaning. In Spanish, they are regarded as slight variations of the same sound.


romanji

A system of Japanese writing based on the Phoenician alphabet. Romanji uses Western writing, the same as English spelling, to transcribe the Japanese language. Romanji is commonly used to teach Westerners Japanese or to give examples of pronunciation, but it still follows the conventions of Japanese, including rules for when and how it is used.


segmental writing system

Usually alphabets. These systems use relatively few symbols that combine to form a range of phonemes and morphemes, representing spoken language. The meaning of the term segmental in linguistics differs from the meaning of segments in translation memory.


simplified Chinese

The writing system used in the People’s Republic of China and other locales. Since the mid-20th century, the Chinese government has gradually simplified the Chinese characters, hànzì. They have reduced the number of characters commonly taught and used, and they have altered the way the characters are written.


syllabic writing system

A writing system in which characters represent syllables and are combined to indicate morphemes. Most commonly, syllabic writing systems only allow vowel (V) or consonant-vowel (CV) syllable structure. The Japanese kana (both hiragana and katakana) and Devanagari are examples of syllabic writing systems.


tense

The time to which a verb refers in a clause. Because language can communicate abstract thought, a verb can be inflected to indicate the time at which it took place. This includes past, present, and future tenses, as well as some more complex tenses like interior past. Other aspects of verbs, such as whether an action is complete or its duration, is not included in tense.


traditional Chinese

The writing system used in Taiwan and other locales. It represents original Chinese hànzì characters. The hànzì characters first appeared in the Han dynasty as a clerical script and are used by both Mandarin and Cantonese speakers, as well as speakers of other Chinese dialects.


vocabulary

A collection of words used by a group or individual for a specific purpose. Vocabulary can be specific, such as a list of scientific animal names, or it can refer to the full working vocabulary of an individual. The nuances of this term are important for translation, as well as for formulating relevant lists of terminology.


writing system

The conventions for writing in a language. The writing system of a language can indicate pronunciation, stress, syllable timing, or just lexical meaning. These systems can be classified into 3 general types. See logographic, syllabic, and alphabetic writing systems for more details.