Character Sets – ASCII & Unicode
Character Sets Overview
- Character set: the list of numeric codes recognised by hardware & software to represent characters.
- Each character ⟶ unique character code ⟶ stored as binary.
ASCII
- Original ASCII: 7-bit; represents 128 characters (codes 0\text{–}127).
- Extended ASCII: 8-bit; represents 256 characters (codes 0\text{–}255).
- Content groups
- Uppercase A\text{–}Z: 65\text{–}90
- Lowercase a\text{–}z: 97\text{–}122
- Digits 0\text{–}9: 48\text{–}57
- Control & punctuation occupy remaining ranges.
- Example codes: A=65=10000012, a=97=11000012.
- Codes run in sequence → knowing one code lets you calculate others (e.g. A+4=E).
Unicode
- Universal character set; typical form UTF-16 uses 16 bits → 65{,}536 possible codes (many modern encodings extend further).
- First 128 codes identical to ASCII for compatibility.
- Supports scripts beyond Latin: Greek, Mandarin, Japanese, emoji, etc.
- Advantages over ASCII:
- Vastly larger range of symbols.
- One standard for multilingual text & modern symbols (e.g. emoji on phones).
Encoding Tables & Code Calculation
- Character codes are grouped and sequential; patterns aid quick conversion:
- Add 32 to convert uppercase → lowercase (e.g. A(65)+32=97(a)).
- Digits follow 48, so digit n has code 48+n.
- Conversion steps
- Denary code → binary (e.g. 71\rightarrow01000111).
- Binary → denary to identify character.
Key Comparisons & Exam Tips
- Bits per character: ASCII 7/8 vs Unicode 16 (or 8\text{–}32 in UTF encodings).
- Capacity: ASCII 128/256 symbols; Unicode 65{,}536+.
- Use Unicode whenever multiple languages or emoji are required; ASCII suffices for basic English text.