Character Sets – ASCII & Unicode

Character Sets Overview

  • Character set: the list of numeric codes recognised by hardware & software to represent characters.
  • Each character ⟶ unique character code ⟶ stored as binary.

ASCII

  • Original ASCII: 7-bit; represents 128 characters (codes 0\text{–}127).
  • Extended ASCII: 8-bit; represents 256 characters (codes 0\text{–}255).
  • Content groups
    • Uppercase A\text{–}Z: 65\text{–}90
    • Lowercase a\text{–}z: 97\text{–}122
    • Digits 0\text{–}9: 48\text{–}57
    • Control & punctuation occupy remaining ranges.
  • Example codes: A=65=10000012, a=97=11000012.
  • Codes run in sequence → knowing one code lets you calculate others (e.g. A+4=E).

Unicode

  • Universal character set; typical form UTF-16 uses 16 bits → 65{,}536 possible codes (many modern encodings extend further).
  • First 128 codes identical to ASCII for compatibility.
  • Supports scripts beyond Latin: Greek, Mandarin, Japanese, emoji, etc.
  • Advantages over ASCII:
    • Vastly larger range of symbols.
    • One standard for multilingual text & modern symbols (e.g. emoji on phones).

Encoding Tables & Code Calculation

  • Character codes are grouped and sequential; patterns aid quick conversion:
    • Add 32 to convert uppercase → lowercase (e.g. A(65)+32=97(a)).
    • Digits follow 48, so digit n has code 48+n.
  • Conversion steps
    1. Denary code → binary (e.g. 71\rightarrow01000111).
    2. Binary → denary to identify character.

Key Comparisons & Exam Tips

  • Bits per character: ASCII 7/8 vs Unicode 16 (or 8\text{–}32 in UTF encodings).
  • Capacity: ASCII 128/256 symbols; Unicode 65{,}536+.
  • Use Unicode whenever multiple languages or emoji are required; ASCII suffices for basic English text.