Character Sets – ASCII & Unicode

Character Sets Overview

Character set: the list of numeric codes recognised by hardware & software to represent characters.
Each character ⟶ unique character code ⟶ stored as binary.

ASCII

Original ASCII: 7-bit; represents 128 characters (codes 0\text{–}127).
Extended ASCII: 8-bit; represents 256 characters (codes 0\text{–}255).
Content groups
- Uppercase A\text{–}Z: 65\text{–}90
- Lowercase a\text{–}z: 97\text{–}122
- Digits 0\text{–}9: 48\text{–}57
- Control & punctuation occupy remaining ranges.
Example codes: A=65=10000012, a=97=11000012.
Codes run in sequence → knowing one code lets you calculate others (e.g. A+4=E).

Unicode

Universal character set; typical form UTF-16 uses 16 bits → 65{,}536 possible codes (many modern encodings extend further).
First 128 codes identical to ASCII for compatibility.
Supports scripts beyond Latin: Greek, Mandarin, Japanese, emoji, etc.
Advantages over ASCII:
- Vastly larger range of symbols.
- One standard for multilingual text & modern symbols (e.g. emoji on phones).

Encoding Tables & Code Calculation

Character codes are grouped and sequential; patterns aid quick conversion:
- Add 32 to convert uppercase → lowercase (e.g. A(65)+32=97(a)).
- Digits follow 48, so digit n has code 48+n.
Conversion steps
1. Denary code → binary (e.g. 71\rightarrow01000111).
2. Binary → denary to identify character.

Key Comparisons & Exam Tips

Bits per character: ASCII 7/8 vs Unicode 16 (or 8\text{–}32 in UTF encodings).
Capacity: ASCII 128/256 symbols; Unicode 65{,}536+.
Use Unicode whenever multiple languages or emoji are required; ASCII suffices for basic English text.