Character Sets: ASCII and Unicode
Character Sets Overview
- A character set is the complete list of binary codes a computer can recognise & process.
- Each character (letter, digit, symbol, control code) is mapped to a unique numeric value that is stored as binary.
- Uses 7 bits per character ⇒ 128 possible codes \left(0!\text{–}!127\right).
- Key groupings (sequential):
• Capital letters A!\text{–}!Z:65\text{–}90
• Lowercase a!\text{–}!z:97\text{–}122
• Digits 0!\text{–}!9:48\text{–}57 - Example mappings:
• A=65=10000012
• G=71=010001112
• *=42=00101010_2 - Codes are consecutive, so knowing one lets you calculate others (e.g. E=65+4=69).
- Limitations: only 128 characters; no support for accented letters, non-Latin scripts, emoji.
Unicode
- A universal character set covering virtually every written language & symbol.
- Original size: 16 bits (BMP) ⇒ 65\,536 codes. Modern UTF encodings scale to over 2\,147\,483\,647 characters.
- First 128 codes mirror ASCII for compatibility.
- Supports scripts such as Greek, Mandarin, Japanese, plus emoji & technical symbols.
Advantages of Unicode over ASCII
- Far larger code space; accommodates multilingual text and pictographs.
- Consistent cross-platform standard.
- Essential for devices/apps that handle diverse languages or emoji (e.g., smartphones).
Exam Quick-Recall Points
- Character set = set of binary codes understood by hardware & software.
- ASCII: 7 bits, 128 characters; understand control vs printable ranges.
- Unicode: 16+ bits, millions of characters; required for global language support.
- Character codes are grouped & run in sequence—use this to convert or infer values.
- Practise conversions: denary \leftrightarrow binary for ASCII codes.
- Be ready to justify why Unicode is chosen over ASCII when additional symbols/languages are needed.