Data compression aa
Data Compression
Basic Principles:
Compress at the source, download, and uncompress at the target.
Change the code representing information into a different format before returning to original.
Code Lengths:
Fixed-length code: each symbol has the same representation length (e.g., 7 bits).
Example: 1000 characters require 7000 bits (1000 characters x 7 bits).
Minimum fixed length for ASCII is 7 bits due to 95 symbols needing representation.
Binary Representation:
For n bits, possible unique representations = 2^n.
7 bits = 128 representations, sufficient for 95 ASCII symbols.
More Data Compression
Variable Length Codes:
Assign representations of different lengths based on frequency analysis.
Common symbols given shorter codes (e.g., E = 6 bits, Z = 8 bits).
3 bits can represent 8 characters, sufficing for 7 symbols.
Example of Variable Length Code
Example with 337 symbols using 3-bit fixed-length code totals 1011 bits.
Create more space-efficient codes by analyzing symbol frequency.
Humorous Compression Techniques
Andy’s Dopey Variable Length Code:
Assigns shortest codes to common symbols, ensuring unique representation.
Avoids ambiguity in variable length codes; there must be a unique representation for each sequence.
Prefix Free Codes
Prefix Free Property:
No two symbols' representations share the same starting sequence.
Example: E (1), A (2), B (00) avoids conflicts.
Each representation must uniquely inform the next character in sequence.
Huffman Coding
Overview:
Efficient and optimal method for data compression.
Steps: Count frequency of symbols, build a tree combining lowest frequencies, decorate branches with 0s and 1s.
Compression Ratios:
Example: From 1011 to 837 bits, compression ratio of 1.21:1.
Lossless vs. Lossy Compression
Lossless Compression:
No information loss during compression (e.g., Huffman Coding).
Can recover original data.
Lossy Compression:
Information is lost (e.g., JPEG, MP3), often providing better compression ratios.
Artifacts:
Compression methods may introduce visual or audio artifacts.
Error Detection and Correction
Checksum Codes:
Checksums calculated during transmission to detect errors.
Example: ASCII message checked by summing bits and appending checksum.
Parity Bits:
The extra bit in an 8-bit ASCII character serves as a parity bit for error checking.
Luhn Algorithm:
Used in credit cards to validate numbers and catch common entry errors.
Error Correcting Codes
Hamming Code:
Example of error-correcting codes.
Detects and corrects single-bit errors, using additional parity bits.
Conclusion: Coding Themes
Common themes across coding methods:
Mathematical properties ensure efficiency and error resilience.
Different codes serve distinct purposes: cryptography, compression, and error detection.