CS-150 – Data Representation: Vocabulary Flashcards

Analog vs Digital

  • Analog = continuous, Digital = discrete

  • Digitise: convert analog signal to binary

  • Discretise: map continuous space to finite set

Binary Basics

  • Bit: 0/1; Byte = 8 bits; Word = machine’s native multiple

  • Capacity: n bits    2nn\text{ bits}\;\Rightarrow\;2^n values

  • Required bits: log2t\lceil \log_2 t \rceil for tt symbols

Integer Representation

  • Unsigned: direct binary

  • Sign-magnitude: MSB = sign; two zeros ⇒ rarely used

  • Two’s complement (standard)

    • Range 2n12n11-2^{n-1}\dots2^{n-1}-1

    • Negate: invert bits + 11

    • Addition/subtraction as normal; overflow when result outside range

Fixed-Size Int Ranges (signed / unsigned)

  • 8-bit: 128..127-128..127 / 0..2550..255

  • 16-bit: 32768..32767-32768..32767 / 0..655350..65535

  • 32-bit: 2147483648..2147483647-2\,147\,483\,648..2\,147\,483\,647 / 0..42949672950..4\,294\,967\,295

  • 64-bit: 9.22×1018..9.22×1018-9.22\times10^{18}..9.22\times10^{18} / 0..1.84×10190..1.84\times10^{19}

Real Numbers & Floating Point

  • Binary fractions use positions 21,22,2^{-1},2^{-2},\dots

  • Floating-point form: sign×mantissa×baseexponent\text{sign}\times\text{mantissa}\times\text{base}^{\text{exponent}}

  • IEEE 754

    • Single (float): 1 sign, 8 biased exponent (bias 127127), 23 fraction bits

    • Double: 1 sign, 11 exponent (bias 10231023), 52 fraction bits

  • Fixed-point: constant digits right of radix; used in accounting

  • Conversion algorithms: repeated divide (integer part) / multiply (fractional part)

Scientific Notation

  • Keeps one non-zero digit left of point; e.g. 1.2001×1041.2001E41.2001\times10^4 \equiv 1.2001E4

Text Encoding

  • Map characters → bit patterns

  • Standards: ASCII (7-bit), EBCDIC, ISO-8859-1, Unicode (UTF-8/16/32, >143\,000 chars)

Colour Representation

  • RGB triplet, usually 8 bits/channel (24-bit colour)

  • Example brown: (150,75,0) \to #964B00

  • Other models: HSV, CMY, CMYK

  • Colour depth = bits per channel

Images

  • Pixel = coloured dot; Resolution = pixel count

  • Raster (BMP, GIF, JPEG, PNG, TIFF): store every pixel; scaling ⇒ pixelation

  • Vector (SVG): store shapes; scale cleanly; small for line art

  • JPEG: lossy; transform to frequency domain & discard high-frequency components

Audio

  • Sound digitised by PCM: sample & quantise

  • Nyquist: sample ≥ 2fmax2f_{max}; CD: 4410044\,100 Hz, 16-bit stereo

  • Formats: Uncompressed (WAV, AIFF), Lossless (FLAC, ALAC), Lossy (MP3, AAC, Ogg)

Video

  • Codec = encoder/decoder, usually lossy, uses temporal + spatial compression

  • Formats: MPEG-2, MPEG-4, AVI, WebM, Matroska

Data Compression Fundamentals

  • Compression ratio =size(compressed)size(original)= \frac{\text{size(compressed)}}{\text{size(original)}} (smaller = better)

  • Lossless vs Lossy (text → lossless; media often lossy)

Lossless Techniques
  • Run-length: c5ccccc*c5 \Rightarrow ccccc

  • Keyword: replace frequent words with single tokens

  • Huffman coding: variable-length prefix codes built from symbol frequencies; optimal among prefix codes

Storage Example

  • 24-bit RGB image 1920×1080: 2073600×2450Mbits6MB2\,073\,600\times24 \approx 50\,\text{Mbits} \approx 6\,\text{MB} uncompressed → compression essential