COMP261 - Huffman Encoding Flashcards

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/14

There's no tags or description

Looks like no tags are added yet.

Last updated 7:29 AM on 5/26/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai	Chat

No analytics yet

Send a link to your students to track their progress

15 Terms

New cards

State the eleven steps of the Huffman encoding process.

Count the frequency of each symbol in the input text.
Example: for she_sells_sea_shells, count how many times s, h, e, _, l, and a appear.
Create one leaf node for each symbol, labelled with its frequency.
Put all leaf nodes into a priority queue, ordered by frequency. The lowest-frequency node has the highest priority. If your test gives tie-breaking rules, apply them exactly.
Remove the two lowest-frequency nodes from the priority queue.
Create a new parent node whose frequency is the sum of those two nodes. The lower-frequency child goes on the left; if there is a tie, use the specified tie-break rule.
Insert the new parent node back into the priority queue.
Repeat steps 4–6 until only one node remains. That final node is the root of the Huffman tree.
Assign binary codes by traversing the tree. Usually, left means 0 and right means 1.
Create the codebook by recording the path from the root to each symbol. Symbols near the top get shorter codes; symbols deeper in the tree get longer codes.
Encode the original text by replacing each symbol with its Huffman code.
Calculate compression rate if required using: compressed bits / fixed-length bits

New cards

State the nine steps for the Huffman decoding process

Start with the encoded bit stream and the same Huffman tree or codebook used during encoding.
Begin at the root of the Huffman tree.
Read one bit at a time from the encoded stream.
Move left for 0 and move right for 1.
Continue moving through the tree until you reach a leaf node.
Output the symbol stored at that leaf.
Return to the root of the tree.
Repeat steps 3–7 until all bits in the encoded stream have been read.
Because Huffman codes are prefix-free, the decoder knows exactly when one symbol ends and the next begins. No spaces or separators are needed between codes.

New cards

What are the advantages of Huffman encoding?

Lossless compression: the original data can be recovered exactly. This matches the slides’ definition of lossless compression.
More frequent symbols get shorter codes: this is the main reason Huffman coding can beat fixed-length coding. The slides describe this as “use fewer bits for more common symbols.”
Prefix-free decoding: no code is the prefix of another code, so the decoder can read the bit stream without separators. This is why the binary tree representation works.
Optimal for the given frequencies among prefix-free symbol codes: the slides state that Huffman coding “generates the best set of codes, given frequencies/probabilities on all the symbols.”
Straightforward algorithm: build leaf nodes, repeatedly combine the two lowest-frequency nodes, then assign 0/1 down the tree. This is exactly the algorithm shown in the slides.

New cards

What are the disadvantages of Huffman encoding?

The decoder needs the same tree or code table: the slides explicitly say that when storing/transmitting a compressed file, you need to include the tree for decompression, which can reduce efficiency.
Bad frequency estimates give poor compression: the mock test asks this directly. A tree built from she_sells_sea_shells performs badly on hall_has_all because characters like a and h become common in the new text but still have long codes from the old tree.
Small files may not compress well: this follows from the tree/codebook overhead. If the input is small, the saved bits may not outweigh the cost of storing the tree.
Little benefit when frequencies are roughly equal: if every symbol appears about equally often, there are no very common symbols to reward with short codes, so Huffman coding approaches fixed-length coding.
It does not exploit longer repeated patterns: basic Huffman coding works one symbol at a time. It compresses based on character/symbol frequency, not repeated sub-strings or words.
It is not an error-correcting method: the slides note that error-detection/error-correcting bits are a separate issue that can be added to compressed data. Huffman coding itself is for compression, not error correction.

New cards

Huffman encoding: How many bits are needed for fixed-length coding with k symbols?

ceil(log2(k)) bits per symbol. Round up

New cards

Huffman encoding: What problem can variable-length coding cause?

Ambiguity: the decoder may not know where one symbol’s code ends and the next begins.

New cards

Why is Huffman coding prefix-free?

Symbols are stored only at leaf nodes in the binary tree, so no symbol code can be the prefix of another.

New cards

Huffman encoding: Why do 10 symbols need 4 bits?

Because 3 bits gives only 8 codes: 2³. But 4 bits gives 16 codes: 2⁴.

New cards

What fixed-length bit count should you compare Huffman coding against? How is compression rate calculated?

Number of symbols in the message × fixed-length bits per symbol. Compression rate = num of compressed bits / num of fixed-length bits.

New cards

Huffman encoding: How do we decide the number of bits in the codebook?

2ⁿ >= number of symbols

New cards

Give pseudocode for decoding using a Huffman tree.

decode(bits, root):
    current = root
    output = ""

    for each bit in bits:
        if bit == 0:
            current = current.left
        else:
            current = current.right

        if current is a leaf:
            output += current.symbol
            current = root

    return output

New cards

Do you need extra schemes or codes to split letters in Huffman coding? Why?

No. Huffman codes are prefix-free. That means no complete symbol code is the start of another symbol code. The decoder can start at the tree root, follow bits until it reaches a leaf, output that symbol, then return to the root. This removes the need for pauses, spaces, or special separator codes.

New cards

What probability distribution gives the most benefit for Huffman compression?

Huffman coding benefits most from a highly non-uniform distribution. That means a small number of symbols appear very often, while other symbols appear rarely. Then common symbols get very short codes and rare symbols get longer codes.

The worst case for Huffman compression is a roughly uniform distribution, where all symbols appear about equally often. In that case, Huffman coding cannot do much better than fixed-length coding.

New cards

What probability pattern produces a linear Huffman tree?

A very linear Huffman tree happens when the probabilities are highly skewed in a step wise pattern. More precisely, after sorting symbols from smallest to largest probability, each combined low-probability group must remain small enough that it keeps being combined with the next-smallest symbol.

A common pattern that causes this is similar to a Fibonacci-like distribution or a distribution where each next probability is at least roughly the sum of the previous smaller ones. This causes the tree to grow like a chain, because the combined node repeatedly remains one of the two lowest-priority nodes.

New cards

What is redundancy?

So redundancy is not “duplicate characters” only. It is more general: it means the data has patterns or uneven probabilities that let us represent it more compactly. The slides state that lossless compression is only possible if there is redundancy in the original, and Huffman coding exploits redundancy in symbol frequencies.