Data compression aa

Data Compression

  • Basic Principles:

    • Compress at the source, download, and uncompress at the target.

    • Change the code representing information into a different format before returning to original.

  • Code Lengths:

    • Fixed-length code: each symbol has the same representation length (e.g., 7 bits).

      • Example: 1000 characters require 7000 bits (1000 characters x 7 bits).

    • Minimum fixed length for ASCII is 7 bits due to 95 symbols needing representation.

  • Binary Representation:

    • For n bits, possible unique representations = 2^n.

    • 7 bits = 128 representations, sufficient for 95 ASCII symbols.

More Data Compression

  • Variable Length Codes:

    • Assign representations of different lengths based on frequency analysis.

      • Common symbols given shorter codes (e.g., E = 6 bits, Z = 8 bits).

    • 3 bits can represent 8 characters, sufficing for 7 symbols.

Example of Variable Length Code

  • Example with 337 symbols using 3-bit fixed-length code totals 1011 bits.

  • Create more space-efficient codes by analyzing symbol frequency.

Humorous Compression Techniques

  • Andy’s Dopey Variable Length Code:

    • Assigns shortest codes to common symbols, ensuring unique representation.

  • Avoids ambiguity in variable length codes; there must be a unique representation for each sequence.

Prefix Free Codes

  • Prefix Free Property:

    • No two symbols' representations share the same starting sequence.

    • Example: E (1), A (2), B (00) avoids conflicts.

    • Each representation must uniquely inform the next character in sequence.

Huffman Coding

  • Overview:

    • Efficient and optimal method for data compression.

    • Steps: Count frequency of symbols, build a tree combining lowest frequencies, decorate branches with 0s and 1s.

  • Compression Ratios:

    • Example: From 1011 to 837 bits, compression ratio of 1.21:1.

Lossless vs. Lossy Compression

  • Lossless Compression:

    • No information loss during compression (e.g., Huffman Coding).

    • Can recover original data.

  • Lossy Compression:

    • Information is lost (e.g., JPEG, MP3), often providing better compression ratios.

  • Artifacts:

    • Compression methods may introduce visual or audio artifacts.

Error Detection and Correction

  • Checksum Codes:

    • Checksums calculated during transmission to detect errors.

    • Example: ASCII message checked by summing bits and appending checksum.

  • Parity Bits:

    • The extra bit in an 8-bit ASCII character serves as a parity bit for error checking.

  • Luhn Algorithm:

    • Used in credit cards to validate numbers and catch common entry errors.

Error Correcting Codes

  • Hamming Code:

    • Example of error-correcting codes.

    • Detects and corrects single-bit errors, using additional parity bits.

Conclusion: Coding Themes

  • Common themes across coding methods:

    • Mathematical properties ensure efficiency and error resilience.

    • Different codes serve distinct purposes: cryptography, compression, and error detection.