Lossless Compression Algorithms

Entropy $(\eta)$ of information source S = {$s1, s2, …, s_n$}:
- $η = H(S) = \sum{i=1}^{n} pi log2{\frac{1}{pi}} = -\sum{i=1}^{n} pi log2{pi}$
- $pi$ = probability of symbol $si$ occurring in S.
- $-log2{\frac{1}{pi}}$ = self-information, bits needed to encode $s_i$ .
Entropy is a measure of disorder; higher entropy means more disorder.
For a gray-level image with uniform distribution, $p_i = \frac{1}{256}$ , entropy is 8 (no compression possible).

Initialization: Sort symbols by frequency.
Repeat until one symbol remains:
- Pick two symbols with lowest frequency.
- Create a Huffman subtree with these as children, create a parent node with the sum of their frequencies.
- Assign the sum of the children’s frequency counts to the parent and insert it in to the list, such that the order is maintained.
- Delete the children from the list.
Assign codeword based on path from root.

Adaptive, dictionary-based technique.
Places repeated entries in a dictionary.
Emits code for element instead of the string itself, if the element is in the dictionary.
Used in UNIX compress, GIF, WinZip.