Prefix-free and Variable Length Codes

Objectives

We would like to :

Examine the basics of Huffman code

Notes

Given an alphabet Σ, a non empty set of symbols from which a message will be constructed, and a message, construct a binary representation for the symbols that minimizes the number of bits transmitted.
Isn't this ASCII?
Well no!
- For ASCII all characters require 7 bits.
- But some letters will be transmitted far more than others.
- So doesn't it make sense to assign a short bit sequence for frequent letters and a longer sequence for less frequent letters?
How do we distinguish characters in ASCII?
- What does 01000100011000010110111 represent
Roughgarden proposes the code
- 00011011
  
  Symbol Fixed Variable Prefix-Free
  
  A 0 0
  
  B 01 10
  
  C 10 110
  
  D 1 111
- Encode AAABDBD in fixed
- Encode AAABDBD in variable
- Encode AAABDBD in prefix-free
- What does the variable encoding assume?
one problem
- Can you decode each of the encodings?
In a prefix-free code, no codeword is the prefix for another code
- This is not the case in the variable code, as 0(A) and 1(D) are both prefixes for 01(B) and 10(C)
Roughgarden suggests drawing a tree
- If all of the letters are leaves, it is prefix-free
- Otherwise it is not.
The Huffman code algorithm uses a frequency analysis to construct a prefix-free code for an alphabet
IE select the most frequent letter and assign the shortest bit code
In the example, assume
- Symbol Frequency
  
  A 60%
  
  B 20%
  
  C 10%
  
  D 5%
We can then compute the "cost" per bit of sending a message in a code
- C = Σ_{α ∈ Σ}l_αf_α
- Where l_α is the number of bits for the letter α
- And f_α is the frequency of the letterα
For the two bit code above C = 2
For the above code C = 1 * .6 + 2 * .2 + 3 * .1 + 3 * .05 = 1.45

Symbol	Fixed	Variable
A	0	0
B	01	10
C	10	110
D	1	111

Symbol	Frequency
A	60%
B	20%
C	10%
D	5%