Notes on Information representation and multimedia (summary)
1.1 Data representation
- Core idea: how computers represent data in binary form, different numeral systems, character encoding, and how these representations affect storage and processing.
- Key terms
- Binary: base-2 number system using values 0 and 1.
- Bit: binary digit.
- One’s complement: invert every bit to represent negatives; example: 01011010 (90) → 10100101 (−90).
- Two’s complement: invert bits and add 1 to get negative; simplifies binary arithmetic for signed numbers.
- Sign and magnitude: sign bit (0 = +, 1 = −) with remaining bits for magnitude.
- Hexadecimal: base-16 system using digits 0–9 and letters A–F; weights 16^n.
- Memory dump: contents of computer memory printed to screen or paper.
- Binary-coded decimal (BCD): use 4 bits to represent each decimal digit (0–9).
- ASCII: coding system for characters on a keyboard and control codes (7-bit standard; 0–127).
- Character set: list of defined characters that hardware/software can represent.
- Unicode: encoding system intended to represent all languages; supports many characters; first 128 common with ASCII; 16/32-bit encodings common; up to four bytes per character.
- What you should already know (overview of concepts to warm up):
- Column weightings for binary and hexadecimal numbers; binary addition/subtraction; converting between binary/denary/hexadecimal; signs using two’s complement.
- Why memory sizes use different units (bytes, kilobytes, mebibytes, etc.) and the distinction between SI prefixes (kilo = 1000) and IEC prefixes (kibi = 1024).
- How to interpret memory dumps and why hexadecimal is used for debugging.
- Connections to foundational principles
- The need for standardized encoding to convert human-readable data into machine-readable bits.
- Trade-offs between compactness (compression) and exactness (lossless vs. lossy representations).
- Practical implications
- Monetary values often require exact representation; BCD is used to avoid rounding errors in fixed-point arithmetic.
- ASCII vs. Unicode affects multilingual support, storage size, and compatibility.
1.1.1 Number systems
- Decimal (denary) is base-10; digits 0–9, weights: 10^0, 10^1, 10^2, … (least significant to most significant).
- Example: 31,421 is 3×10^4 + 1×10^3 + 4×10^2 + 2×10^1 + 1×10^0.
- Binary (base-2) uses digits 0 and 1; digits are called bits; weighted columns.
- Relationship to computer storage
- Digital switches ON=1, OFF=0; any data ultimately stored as binary digits.
1.1.2 Binary number system
- Weightings for 8-bit binary numbers (most-significant to least-significant):
2^7 \text{(128)}, \ 2^6 \text{(64)}, \ 2^5 \text{(32)}, \ 2^4 \text{(16)}, \ 2^3 \text{(8)}, \ 2^2 \text{(4)}, \ 2^1 \text{(2)}, \ 2^0 \text{(1)}
- In decimal: 128, 64, 32, 16, 8, 4, 2, 1
- Converting binary to denary (example): if bits 1 appear in a column, add the column value.
- Example: binary 1110 1110 (8 bits) → 128 + 64 + 32 + 8 + 4 + 2 = 238_{denary}.
- Converting denary to binary has two common methods:
- Method 1: place 1s in appropriate positions to sum to the denary value.
- Method 2: successive division by 2; write remainders bottom-to-top.
- Binary arithmetic for signed numbers uses two’s complement (preferred in this text):
- One’s complement: invert all bits (0↔1).
- Two’s complement: invert all bits and add 1 to the least significant bit.
- Benefits: simplifies addition/subtraction of signed numbers, avoids separate subtraction logic.
- 8-bit two’s complement example and range
- Example value: +90 (0101 1010) → −90 (two’s complement) becomes 1010 0110 after conversion in 8-bit representation (illustrative).
- Range for 8-bit two’s complement: -2^7 \leq N \leq 2^7-1 which is [-128, 127].
- Activity/Exercises (typical questions students practice)
- Convert several 8-bit binary numbers to denary using two’s complement.
- Convert several denary numbers to 8-bit binary two’s complement.
- Perform binary addition and subtraction with overflow awareness.
1.1.3 Hexadecimal number system
- Hexadecimal is base-16 with digits 0-9, A-F; weights are 16^n: 16^3, 16^2, 16^1, 16^0.
- Example digits: A=10, B=11, C=12, D=13, E=14, F=15.
- Relationship to binary: one hex digit corresponds to four binary digits because 16 = 2^4.
- Converting between binary and hex:
- Binary to hex: group bits into 4-bit chunks from right to left; leftmost group may be shorter; translate each 4-bit group to a hex digit using the 16-table.
- Hex to binary: replace each hex digit with its 4-bit binary equivalent.
- Practical use: memory dumps are often shown in hexadecimal for readability and ease of tracing memory contents.
- Example conversions (per text): 8-bit binary 1011 1110 0011 0001 → hex B E 1 1 (and similar examples).
- Table reference: hex–binary–denary mapping for quick lookup (Table 1.3 in the source).
1.1.4 Binary-coded decimal (BCD)
- BCD represents each decimal digit with 4 bits, using codes 0000 to 1001 for digits 0–9.
- Example: the denary number 3165 becomes BCD as 0011 0001 0110 0101 (per digit).
- Two methods of storing BCD digits:
- Four separate 4-bit codes (one per decimal digit) stored as four bytes or 4 nibbles.
- Two bytes storing two BCD digits per byte (two 4-bit codes per byte).
- Uses and significance:
- Useful for monetary values and fixed-point representations to avoid decimal rounding issues when displaying to users.
- Allows precise decimal digits display (e.g., fixed-point currency like $1.31).
- Extension: discuss issues arising when adding BCD digits and how binary arithmetic must accommodate carries into a decimal digit that would otherwise exceed 9.
1.1.5 ASCII codes and Unicode
- ASCII (7-bit) codes:
- Range: 0-127 (0x00–0x7F in hex).
- Includes letters, digits, punctuation, and 32 control codes (0–31).
- Extended ASCII uses 8 bits (0–255) to support additional symbols and characters.
- Examples and relationships:
- Uppercase letters (A–Z) and lowercase (a–z) are assigned distinct codes; the 6th bit often differentiates case (e.g., 0x41 for 'A', 0x61 for 'a').
- ASCII tables group characters in sequence to ease use.
- Unicode:
- Aims to represent all languages and scripts; first 128 characters overlap with ASCII.
- Encoding sizes commonly used: 16-bit or 32-bit per character; up to 4 bytes per character in modern encodings (UTF-8/UTF-16/UTF-32 families).
- Unicode goals include universal standard, more efficient encoding than ASCII, unambiguous encoding for each character, and private-use areas for user-specific characters.
- Practical notes
- ASCII uses 1 byte per character (in extended ASCII); Unicode can require more bytes per character (2–4 bytes typically).
- Unicode enables global software compatibility across languages and platforms.
- Additional data: sample Unicode character block shows extensive character sets beyond ASCII (Russian, Greek, Romanian, Croatian, etc.).
1.1.6 Memory sizes and IEC standard
- Memory size terminology (Table 1.1 in the source):
- 1 kilobyte (1 KB) = 1000 bytes (decimal SI unit).
- 1 megabyte (1 MB) = 1,000,000 bytes.
- 1 gigabyte (1 GB) = 1,000,000,000 bytes.
- 1 terabyte (1 TB) = 1,000,000,000,000 bytes.
- 1 petabyte (1 PB) = 1,000,000,000,000,000 bytes.
- IEC (binary) prefixes offer more accurate representations for memory:
- 1 kibibyte (1 KiB) = 2^{10} = 1024 bytes.
- 1 mebibyte (1 MiB) = 2^{20} = 1,048,576 bytes.
- 1 gibibyte (1 GiB) = 2^{30} = 1,073,741,824 bytes.
- 1 tebibyte (1 TiB) = 2^{40} = 1,099,511,627,776 bytes.
- 1 pebibyte (1 PiB) = 2^{50} = 1,125,899,906,842,624 bytes.
- Rationale: IEC prefixes are more accurate for binary computer memory usage; RAM and internal memories are better described by the IEC system.
- Practical example: a 64 GiB RAM can store 64 imes 2^{30} = 68{,}719{,}476{,}736 bytes.
- Relevance to file sizing: helps avoid confusion when calculating file sizes or RAM requirements.
1.1.3 Hexadecimal number system (recap)
- Hexadecimal as a bridge between binary and denary:
- Weights: 16^3, 16^2, 16^1, 16^0 = 4096, 256, 16, 1.
- Each hex digit corresponds to exactly four binary bits: 1 hex digit = 4 bits.
- Software tooling uses hex for memory dumps and low-level data inspection due to compact readability of binary data.
- Conversions:
- To convert binary to hex: group into 4-bit chunks from right; pad leftmost chunk with zeros if needed.
- To convert hex to binary: replace each hex digit with its 4-bit binary equivalent using a lookup table (Table 1.3 in the source).
- Binary–to–denary/denary–to–binary conversion practice is commonly tested via activities (convert, overflow handling, etc.).
- Memory dumps: hexadecimal representation of memory contents is easier to read and trace; essential for debugging and memory analysis.
- Key terms
- Bit-map image: image composed of pixels; each pixel has colour information stored as bits.
- Pixel: smallest picture element; colour depth defines how many bits per pixel.
- Colour depth: number of bits used to represent the colour of a pixel; e.g., 8-bit colour depth allows 2^8 = 256 colours.
- Bit depth vs. colour depth: bit depth is the number of bits used for a single sample (e.g., a sample of sound or a single pixel); colour depth for images can be higher (e.g., 24-bit true colour).
- Image resolution: total number of pixels in an image (e.g., 4096 × 3192 = 12,738,656 pixels).
- Screen resolution: number of horizontal by vertical pixels on a display screen.
- Pixel density: number of pixels per square centimeter; relates to perceived sharpness.
- Vector graphics: images defined by 2D points, lines, curves, and properties; scalable without loss of quality.
- Sampling resolution (bit depth) and sampling rate (samples per second): determine the fidelity of digitised sound.
- Frame rate: number of video frames per second.
- Bit-map images (section 1.2.1)
- Stored as a 2D matrix of pixels; each pixel can be represented by 1, 8, 16, 24, or 32 bits, etc.
- True colour: typically 24 bits per pixel (8 bits per colour channel: R, G, B).
- Higher bit depth = more possible colours, larger file size.
- Display considerations: if screen resolution < image resolution, scaling or cropping may be required.
- Vector graphics (section 1.2.2)
- Differences from bitmaps:
- Vector graphics describe shapes via geometric primitives and attributes, not pixels.
- Scaling can be done without loss of quality; smaller file sizes for simple graphics; not always realistic for photos.
- When to use:
- Resizeable graphics (logos, CAD drawings, exploded diagrams) are better as vectors.
- Photographs are better as bitmaps (raster images).
- Typical formats: vector: .svg, .cgm, .odg; bitmap: .jpeg, .bmp, .png.
- Sound files (section 1.2.3)
- Sound is analogue and must be digitised via an analogue-to-digital converter (ADC).
- After conversion, sampling rate (samples per second) and sampling resolution (bit depth) determine fidelity and file size.
- Higher sampling rate and resolution yield better sound but larger file sizes.
- CDs commonly use 16-bit sampling (higher fidelity). Filtering: reduce frequencies outside human hearing to save space (perceptual shaping).
- Amplitude and frequency determine the waveform; higher bit depth yields a larger dynamic range.
- Video (section 1.2.4)
- Digital video typically stores frames as a sequence of images (frame rate); motion JPEG is a common encoding approach.
- Video compression is essential for streaming and storage.
1.3 File compression
- Objective: reduce file size while maintaining acceptable fidelity; two main categories: lossless and lossy.
- Key terms
- Lossless compression: original file can be perfectly reconstructed after decompression (e.g., Run-Length Encoding – RLE).
- Lossy compression: some data is discarded; decompressed file is not identical to the original (e.g., MP3, JPEG).
- JPEG: lossy image compression based on perceptual limitations of human vision.
- MP3/MP4: lossy compression for audio and multimedia; MP4 can store audio, video, images, and animation.
- Perceptual shaping: discards data outside the range of human perception to reduce file size while maintaining perceived quality.
- Bit rate: number of bits per second in a stream; higher bit rates yield better quality but larger files.
- Run-length encoding (RLE): a lossless technique that encodes runs of identical data as a count followed by the data value.
- Lossless vs. lossy: key differences and when each is appropriate (e.g., documents vs. media).
- File compression applications (MP3/MP4, JPEG, SVG)
- MP3 uses perceptual encoding to reduce audio data by removing inaudible components; typical bit rates range from ~80 to ~320 kbps; 200 kbps is common for high-quality audio.
- MP4 stores multimedia data (audio, video, images, and more) in a single container; supports streaming with reduced file sizes.
- JPEG compresses bitmap images with lossy compression; commonly reduces file size by factor 5–15 depending on quality settings.
- SVG is a text-based vector format; compression can be applied to the XML text (e.g., gzip).
- Run-length encoding (RLE) (Section 1.3.1 & 1.3.3)
- Concept: replace runs of identical data with a pair (count, value).
- Example use: 8x8 grid of pixels (or a string of identical characters) -> reduced storage when runs are long.
- Effectiveness depends on the data having long runs of identical values; less effective for highly varied data.
- General methods of compression (Section 1.3.2)
- Practical, non-algorithmic approaches for reducing file size:
- Reduce sampling rate or sampling resolution (for audio) and frame rate (for video).
- Crop or resize images.
- Decrease colour depth/bit depth.
- Reduce image resolution where acceptable.
- Practical calculations and examples (from the chapters)
- Bit-map file size estimate: for a full-screen image with resolution W imes H and bit depth b:
ext{bits} = W \times H \times b,
\quad \text{bytes} = \frac{W \times H \times b}{8}.
Example: 1920 × 1080 with 24-bit color yields
1920 \times 1080 \times 24 = 49{,}766{,}400\ \, \text{bits} which is
6{,}220{,}800\ \, \text{bytes} = 6.22\,\text{MB (SI units)}.
- Header information for image files
- Important fields in a file header include: file type/format (e.g., .bmp or .jpeg), file size, image resolution, bit depth, and any compression method used.
Section-specific activities and notes (summary of typical exam-style prompts)
- Convert various 8-bit binary numbers to denary using two’s-complement representation and determine the maximum range for 8-bit numbers.
- Convert 8-bit binary numbers to BCD and vice versa; interpret BCD digits as decimal digits and understand its use in monetary values.
- Explain why overflow can occur when adding two positive 8-bit numbers and how the 9th bit is treated in two’s-complement arithmetic.
- Explain the difference between 8-bit signed magnitude, ones’ complement, and two’s-complement representations, with practical guidance on why two’s-complement is preferred.
- Calculate the file size needed for a bit-map image given resolution and bit depth; discuss how headers affect total file size and why compression is used.
- Compare bit-map and vector graphics in terms of scalability, realism, and typical use cases (logos vs. photos).
- Explain sampling rate, sampling resolution (bit depth), frame rate, and their impact on sound/video quality and file size; give practical examples like CD-quality audio (16-bit, 44.1 kHz) vs. other formats.
- Describe lossless vs. lossy compression, give examples (RLE, JPEG, MP3), and explain perceptual shaping.
- Understand memory units and the difference between decimal (SI) prefixes and IEC binary prefixes (KiB, MiB, etc.).
- Explain the ASCII and Unicode systems, including why Unicode is needed for multilingual text and how character size can vary (1 byte vs. 2–4 bytes per character).
- Binary weights (8-bit): 2^7=128, 2^6=64, 2^5=32, 2^4=16, 2^3=8, 2^2=4, 2^1=2, 2^0=1
- Two’s complement range for 8 bits: [-2^7, 2^7-1] = [-128, 127]
- Hexadecimal to binary: one hex digit = four bits; example: hex "A7" = 1010 0111
- Binary to hex: group into 4-bit chunks from the least significant end; pad on the left if needed.
- Memory sizes (decimal vs. IEC):
- SI: 1 KB = 1000 bytes, 1 MB = 10^6 bytes, 1 GB = 10^9 bytes, 1 TB = 10^12 bytes.
- IEC: 1 KiB = 2^{10} = 1024 bytes, 1 MiB = 2^{20}, 1 GiB = 2^{30}, 1 TiB = 2^{40}.
- Image data size example (bit-map): for 1920 × 1080 at 24-bit color:
1920 \times 1080 \times 24 = 49{,}766{,}400\text{ bits}
\frac{49{,}766{,}400}{8} = 6{,}220{,}800\text{ bytes} = 6.22\,\text{MB (SI)}.