Character Sets: ASCII and Unicode

Character Sets Overview
  • A character set is the complete list of binary codes a computer can recognise & process.
  • Each character (letter, digit, symbol, control code) is mapped to a unique numeric value that is stored as binary.
ASCII (American Standard Code for Information Interchange)
  • Uses 77 bits per character ⇒ 128128 possible codes (0!!127)\left(0!\text{–}!127\right).
  • Key groupings (sequential):
    • Capital letters A!!Z:6590A!\text{–}!Z:65\text{–}90
    • Lowercase a!!z:97122a!\text{–}!z:97\text{–}122
    • Digits 0!!9:48570!\text{–}!9:48\text{–}57
  • Example mappings:
    A=65=1000001<em>2A=65=1000001<em>2G=71=01000111</em>2G=71=01000111</em>2
    =42=001010102*=42=00101010_2
  • Codes are consecutive, so knowing one lets you calculate others (e.g. E=65+4=69E=65+4=69).
  • Limitations: only 128128 characters; no support for accented letters, non-Latin scripts, emoji.
Unicode
  • A universal character set covering virtually every written language & symbol.
  • Original size: 1616 bits (BMP) ⇒ 6553665\,536 codes. Modern UTF encodings scale to over 21474836472\,147\,483\,647 characters.
  • First 128128 codes mirror ASCII for compatibility.
  • Supports scripts such as Greek, Mandarin, Japanese, plus emoji & technical symbols.
Advantages of Unicode over ASCII
  • Far larger code space; accommodates multilingual text and pictographs.
  • Consistent cross-platform standard.
  • Essential for devices/apps that handle diverse languages or emoji (e.g., smartphones).
Exam Quick-Recall Points
  • Character set = set of binary codes understood by hardware & software.
  • ASCII: 77 bits, 128128 characters; understand control vs printable ranges.
  • Unicode: 1616+ bits, millions of characters; required for global language support.
  • Character codes are grouped & run in sequence—use this to convert or infer values.
  • Practise conversions: denary \leftrightarrow binary for ASCII codes.
  • Be ready to justify why Unicode is chosen over ASCII when additional symbols/languages are needed.