CS-150 – Data Representation: Vocabulary Flashcards

Analog vs Digital

  • Analog = continuous, Digital = discrete

  • Digitise: convert analog signal to binary

  • Discretise: map continuous space to finite set

Binary Basics

  • Bit: 0/1; Byte = 8 bits; Word = machine’s native multiple

  • Capacity: n\text{ bits}\;\Rightarrow\;2^n values

  • Required bits: \lceil \log_2 t \rceil for t symbols

Integer Representation

  • Unsigned: direct binary

  • Sign-magnitude: MSB = sign; two zeros ⇒ rarely used

  • Two’s complement (standard)

    • Range -2^{n-1}\dots2^{n-1}-1

    • Negate: invert bits + 1

    • Addition/subtraction as normal; overflow when result outside range

Fixed-Size Int Ranges (signed / unsigned)

  • 8-bit: -128..127 / 0..255

  • 16-bit: -32768..32767 / 0..65535

  • 32-bit: -2\,147\,483\,648..2\,147\,483\,647 / 0..4\,294\,967\,295

  • 64-bit: -9.22\times10^{18}..9.22\times10^{18} / 0..1.84\times10^{19}

Real Numbers & Floating Point

  • Binary fractions use positions 2^{-1},2^{-2},\dots

  • Floating-point form: \text{sign}\times\text{mantissa}\times\text{base}^{\text{exponent}}

  • IEEE 754

    • Single (float): 1 sign, 8 biased exponent (bias 127), 23 fraction bits

    • Double: 1 sign, 11 exponent (bias 1023), 52 fraction bits

  • Fixed-point: constant digits right of radix; used in accounting

  • Conversion algorithms: repeated divide (integer part) / multiply (fractional part)

Scientific Notation

  • Keeps one non-zero digit left of point; e.g. 1.2001\times10^4 \equiv 1.2001E4

Text Encoding

  • Map characters → bit patterns

  • Standards: ASCII (7-bit), EBCDIC, ISO-8859-1, Unicode (UTF-8/16/32, >143\,000 chars)

Colour Representation

  • RGB triplet, usually 8 bits/channel (24-bit colour)

  • Example brown: (150,75,0) \to #964B00

  • Other models: HSV, CMY, CMYK

  • Colour depth = bits per channel

Images

  • Pixel = coloured dot; Resolution = pixel count

  • Raster (BMP, GIF, JPEG, PNG, TIFF): store every pixel; scaling ⇒ pixelation

  • Vector (SVG): store shapes; scale cleanly; small for line art

  • JPEG: lossy; transform to frequency domain & discard high-frequency components

Audio

  • Sound digitised by PCM: sample & quantise

  • Nyquist: sample ≥ 2f_{max}; CD: 44\,100 Hz, 16-bit stereo

  • Formats: Uncompressed (WAV, AIFF), Lossless (FLAC, ALAC), Lossy (MP3, AAC, Ogg)

Video

  • Codec = encoder/decoder, usually lossy, uses temporal + spatial compression

  • Formats: MPEG-2, MPEG-4, AVI, WebM, Matroska

Data Compression Fundamentals

  • Compression ratio = \frac{\text{size(compressed)}}{\text{size(original)}} (smaller = better)

  • Lossless vs Lossy (text → lossless; media often lossy)

Lossless Techniques
  • Run-length: *c5 \Rightarrow ccccc

  • Keyword: replace frequent words with single tokens

  • Huffman coding: variable-length prefix codes built from symbol frequencies; optimal among prefix codes

Storage Example

  • 24-bit RGB image 1920×1080: 2\,073\,600\times24 \approx 50\,\text{Mbits} \approx 6\,\text{MB} uncompressed → compression essential