Prefix-free and Variable Length Codes
Objectives
We would like to :
- Examine the basics of Huffman code
Notes
- Given an alphabet Σ, a non empty set of symbols from which a message will be constructed, and a message, construct a binary representation for the symbols that minimizes the number of bits transmitted.
- Isn't this ASCII?
- Well no!
- For ASCII all characters require 7 bits.
- But some letters will be transmitted far more than others.
- So doesn't it make sense to assign a short bit sequence for frequent letters and a longer sequence for less frequent letters?
- How do we distinguish characters in ASCII?
- What does 01000100011000010110111 represent
- Roughgarden proposes the code
-
| Symbol | Fixed | Variable | Prefix-Free |
| A | 000 | 0 |
| B | 0101 | 10 |
| C | 1010 | 110 |
| D | 111 | 111 |
- Encode AAABDBD in fixed
- Encode AAABDBD in variable
- Encode AAABDBD in prefix-free
- What does the variable encoding assume?
- one problem
- Can you decode each of the encodings?
- In a prefix-free code, no codeword is the prefix for another code
- This is not the case in the variable code, as 0(A) and 1(D) are both prefixes for 01(B) and 10(C)
- Roughgarden suggests drawing a tree
- If all of the letters are leaves, it is prefix-free
- Otherwise it is not.
- The Huffman code algorithm uses a frequency analysis to construct a prefix-free code for an alphabet
- IE select the most frequent letter and assign the shortest bit code
- In the example, assume
| Symbol | Frequency |
| A | 60% |
| B | 20% |
| C | 10% |
| D | 5% |
- We can then compute the "cost" per bit of sending a message in a code
- C = Σα ∈ Σlαfα
- Where lα is the number of bits for the letter α
- And fα is the frequency of the letterα
- For the two bit code above C = 2
- For the above code C = 1 * .6 + 2 * .2 + 3 * .1 + 3 * .05 = 1.45