CS-150 – Data Representation: Vocabulary Flashcards

Analog vs Digital

Analog = continuous, Digital = discrete
Digitise: convert analog signal to binary
Discretise: map continuous space to finite set

Binary Basics

Bit: 0/1; Byte = 8 bits; Word = machine’s native multiple
Capacity: n\text{ bits}\;\Rightarrow\;2^n values
Required bits: \lceil \log_2 t \rceil for t symbols

Integer Representation

Unsigned: direct binary
Sign-magnitude: MSB = sign; two zeros ⇒ rarely used
Two’s complement (standard)
- Range -2^{n-1}\dots2^{n-1}-1
- Negate: invert bits + 1
- Addition/subtraction as normal; overflow when result outside range

Fixed-Size Int Ranges (signed / unsigned)

8-bit: -128..127 / 0..255
16-bit: -32768..32767 / 0..65535
32-bit: -2\,147\,483\,648..2\,147\,483\,647 / 0..4\,294\,967\,295
64-bit: -9.22\times10^{18}..9.22\times10^{18} / 0..1.84\times10^{19}

Real Numbers & Floating Point

Binary fractions use positions 2^{-1},2^{-2},\dots
Floating-point form: \text{sign}\times\text{mantissa}\times\text{base}^{\text{exponent}}
IEEE 754
- Single (float): 1 sign, 8 biased exponent (bias 127), 23 fraction bits
- Double: 1 sign, 11 exponent (bias 1023), 52 fraction bits
Fixed-point: constant digits right of radix; used in accounting
Conversion algorithms: repeated divide (integer part) / multiply (fractional part)

Scientific Notation

Keeps one non-zero digit left of point; e.g. 1.2001\times10^4 \equiv 1.2001E4

Text Encoding

Map characters → bit patterns
Standards: ASCII (7-bit), EBCDIC, ISO-8859-1, Unicode (UTF-8/16/32, >143\,000 chars)

Colour Representation

RGB triplet, usually 8 bits/channel (24-bit colour)
Example brown: (150,75,0) \to #964B00
Other models: HSV, CMY, CMYK
Colour depth = bits per channel

Images

Pixel = coloured dot; Resolution = pixel count
Raster (BMP, GIF, JPEG, PNG, TIFF): store every pixel; scaling ⇒ pixelation
Vector (SVG): store shapes; scale cleanly; small for line art
JPEG: lossy; transform to frequency domain & discard high-frequency components

Audio

Sound digitised by PCM: sample & quantise
Nyquist: sample ≥ 2f_{max}; CD: 44\,100 Hz, 16-bit stereo
Formats: Uncompressed (WAV, AIFF), Lossless (FLAC, ALAC), Lossy (MP3, AAC, Ogg)

Video

Codec = encoder/decoder, usually lossy, uses temporal + spatial compression
Formats: MPEG-2, MPEG-4, AVI, WebM, Matroska

Data Compression Fundamentals

Compression ratio = \frac{\text{size(compressed)}}{\text{size(original)}} (smaller = better)
Lossless vs Lossy (text → lossless; media often lossy)

Lossless Techniques

Run-length: *c5 \Rightarrow ccccc
Keyword: replace frequent words with single tokens
Huffman coding: variable-length prefix codes built from symbol frequencies; optimal among prefix codes

Storage Example

24-bit RGB image 1920×1080: 2\,073\,600\times24 \approx 50\,\text{Mbits} \approx 6\,\text{MB} uncompressed → compression essential