What is it?
A defined list of grouped symbols used for digital communication.
Why is it important?
All global text belongs to a particular character set. Digital programs and platforms expect a specific character set so that they correctly process, render, and visualize each character of the text.
Why does a technical communicator need to know this?
In its simplest form, a character set is a mapping (table) between text characters and the binary numbers that a computer or other digital device understands. For example, the 3 letters A, B, C
are read as 01000001, 01000010, 01000011
by a computer using the ASCII character set (one of the early character sets).
As the need for global software arose in the 1980s and 1990s, computer scientists devised digital character sets that could manage character complexity and the thousands of characters in languages such as Chinese. Some character sets assigned a single byte to characters and others used double or multiple bytes for each character. Vendor- and platform-specific character sets also became common and created situations where similar character sets had different values for the same character, which meant that characters would be rendered incorrectly if processed using the mapping for the wrong character set[IANA].
If an application supports a specific character set, the user’s device needs to recognize and support the same character set, as part of the due diligence for publishing globally.
For this reason, software localization and development engineers must understand character sets[Zentgraf 2015]. Issues with character sets can be the bane of their lives, especially when character corruption occurs – for example, when translated software strings are moved across platforms that support different character sets or character encodings (e.g. from UNIX to Windows).
Today, more harmonization exists in this area with the proliferation of Unicode[Tero 2012] (which assigns a unique number to every character in nearly every language) and its various character encodings. A character set can have multiple character encodings, but each encoding can relate to only one character set[Open-Std].
References
- [IANA] Official Character set names on the internet (IANA)
- [Zentgraf 2015] Programmer information on Character sets and encoding: What every programmer absolutely needs to know: Zentgraf, David C.
- [Tero 2012] Unicode, UTF8 & Character Sets: The Ultimate Guide: Tero, Paul.
- [Open-Std] Universal Character Set Characters: Open-std.org. Open standard that lists the Universal Character Set characters.