Character Sets: ASCII and Unicode
Character Sets Overview
- A character set is the complete list of binary codes a computer can recognise & process.
- Each character (letter, digit, symbol, control code) is mapped to a unique numeric value that is stored as binary.
- Uses 7 bits per character ⇒ 128 possible codes (0!–!127).
- Key groupings (sequential):
• Capital letters A!–!Z:65–90
• Lowercase a!–!z:97–122
• Digits 0!–!9:48–57 - Example mappings:
• A=65=1000001<em>2
• G=71=01000111</em>2
• ∗=42=001010102 - Codes are consecutive, so knowing one lets you calculate others (e.g. E=65+4=69).
- Limitations: only 128 characters; no support for accented letters, non-Latin scripts, emoji.
Unicode
- A universal character set covering virtually every written language & symbol.
- Original size: 16 bits (BMP) ⇒ 65536 codes. Modern UTF encodings scale to over 2147483647 characters.
- First 128 codes mirror ASCII for compatibility.
- Supports scripts such as Greek, Mandarin, Japanese, plus emoji & technical symbols.
Advantages of Unicode over ASCII
- Far larger code space; accommodates multilingual text and pictographs.
- Consistent cross-platform standard.
- Essential for devices/apps that handle diverse languages or emoji (e.g., smartphones).
Exam Quick-Recall Points
- Character set = set of binary codes understood by hardware & software.
- ASCII: 7 bits, 128 characters; understand control vs printable ranges.
- Unicode: 16+ bits, millions of characters; required for global language support.
- Character codes are grouped & run in sequence—use this to convert or infer values.
- Practise conversions: denary ↔ binary for ASCII codes.
- Be ready to justify why Unicode is chosen over ASCII when additional symbols/languages are needed.