Character Sets: ASCII and Unicode

Character Sets Overview

  • A character set is the complete list of binary codes a computer can recognise & process.
  • Each character (letter, digit, symbol, control code) is mapped to a unique numeric value that is stored as binary.

ASCII (American Standard Code for Information Interchange)

  • Uses 7 bits per character ⇒ 128 possible codes \left(0!\text{–}!127\right).
  • Key groupings (sequential):
    • Capital letters A!\text{–}!Z:65\text{–}90
    • Lowercase a!\text{–}!z:97\text{–}122
    • Digits 0!\text{–}!9:48\text{–}57
  • Example mappings:
    • A=65=10000012 • G=71=010001112
    • *=42=00101010_2
  • Codes are consecutive, so knowing one lets you calculate others (e.g. E=65+4=69).
  • Limitations: only 128 characters; no support for accented letters, non-Latin scripts, emoji.

Unicode

  • A universal character set covering virtually every written language & symbol.
  • Original size: 16 bits (BMP) ⇒ 65\,536 codes. Modern UTF encodings scale to over 2\,147\,483\,647 characters.
  • First 128 codes mirror ASCII for compatibility.
  • Supports scripts such as Greek, Mandarin, Japanese, plus emoji & technical symbols.

Advantages of Unicode over ASCII

  • Far larger code space; accommodates multilingual text and pictographs.
  • Consistent cross-platform standard.
  • Essential for devices/apps that handle diverse languages or emoji (e.g., smartphones).

Exam Quick-Recall Points

  • Character set = set of binary codes understood by hardware & software.
  • ASCII: 7 bits, 128 characters; understand control vs printable ranges.
  • Unicode: 16+ bits, millions of characters; required for global language support.
  • Character codes are grouped & run in sequence—use this to convert or infer values.
  • Practise conversions: denary \leftrightarrow binary for ASCII codes.
  • Be ready to justify why Unicode is chosen over ASCII when additional symbols/languages are needed.