Variable * Length Codes

Chapter 4
Variable–Length and
Huffman Codes
Unique Decodability
We must always be able to determine where one code
word ends and the next one begins. Counterexample:
Suppose: s1 = 0; s2 = 1; s3 = 11; s4 = 00
0011 = s4s3 or s1s1s3
Unique decodability means that any two distinct
sequences of symbols (of possibly differing lengths)
result in distinct code words.
Let S  s1 ,  , s q  be the set of source symbols, and
n



n
th
S  S    S  s i1  s i n | s i j  S be the n direct product

of S . These are the sequences

from S of length
n
4.1, 2
Instantaneous Codes
s1 = 0
s2 = 10
s3 = 110
s4 = 111
1
0
s1
1
0
s2
1
0
s3
s4
decoding tree
No code word is the prefix of another. By reading a
continuous sequence of code words, one can
instantaneously determine the end of each code word.
Consider the reverse: s1 = 0; s2 = 01; s3 = 011; s4 = 111
0111……111 is uniquely decodable, but the first symbol
cannot be decoded without reading all the way to the end.
4.3
Constructing Instantaneous Codes
comma code:
modification:
s1 = 0 s2 = 10 s3 = 110 s4 = 1110 s5 = 1111
s1 = 00 s2 = 01 s3 =10 s4 = 110 s5 = 111
Decoding tree
0
0
s1 = 00
Notice that every code word is
located on the leaves
1
s2 = 01
1
0
s3 = 10
1
0
s4 = 110
1
s5 = 111
4.4
Kraft Inequality
Theorem: There exists an instantaneous code for S where each
symbol s  S is encoded in radix r with length |s| if and
1
only if

s S
r
s
1
Proof: () By induction on the height (maximal length path) of the
decoding tree, max{|s|: s  S}. For simplicity, pick r = 2 (the
binary case). By IH, the leaves of T0, T1 satisfy the Kraft
inequality.
Basis: n = 1
0,1
s1
1
2
or
Induction: n > 1
0
s1
1
0
1
s2
1
2

1
T0
Could use n = 0 here!

s T 0
1
<n
1
2
1
2
s
Prefixing one symbol at
top of tree increases all
the lengths by one, so
<n
1

s T1
1
2
s
T1
1

1

2

1

s T 0
2
s 1


1

2

1

s  T1
2
s 1
1
4.5
Same argument for radix r:
Induction: n > 1
Basis: n = 1
≤ r 1
0
……
0 ……
≤ r1
s1 …………
s≤r
T0
at most r
r
1
r
i 1
1
1
T≤r-1
at most r subtrees

s T i
1
r
s 1

1

r
s T i
1
r
s

1
r
IH
so adding at most r of these together gives ≤ 1 ⁯
Inequality in the binary case implies that not all internal nodes have
degree 2, but if a node has degree 1, then clearly that edge can be
removed by contraction.
4.5
Kraft Inequality ()
Construct a code via decoding trees. Number the
symbols s1, …, sq so that l1 ≤ … ≤ lq and assume K ≤ 1.
Greedy method: proceed left-to-right, systematically
assigning leaves to code words, so that you never pass
through or land on a previous one. The only way this
method could fail is if it runs out of nodes (tree is
over-full), but that would mean K > 1.
Exs:
r=2
1, 3, 3, 3
0
1
0
1, 2, 3, 3
0
1
0
r=2
1
1
0
1
0
1
not used
½+⅛+⅛+⅛<1
1, 2, 2, 3
0
0
1 0
r=2
½+¼+⅛+⅛=1
1
½+¼+¼+⅛>1
4.5
Shortened Block Codes
With exactly 2m symbols, we can form a set of code words each of
length m : b1 …… bm bi  {0,1}. This is a complete binary decoding
tree of depth m. With < 2m symbols, we can chop off branches to get
modified (shortened) block codes.
0
1
s1
0
0
s2
1
0
1
Ex 1
1
0
1
0
s3
0
1
s1
1
s4
s5
s3
0
1
Ex 2
s2
s4
s5
4.6
McMillan Inequality
Idea: Uniquely decodable codes satisfy the same bounds as
instantaneous codes.
Theorem: Suppose we have a uniquely decodable code in
radix r of lengths of l1 ≤ … ≤ lq . Then their Kraft sum is ≤ 1.
q
Proof : Let K 
1
r
i 1
li
nl q
, and consider
K
n


k n
Nk
r
k
Use a multinomial expansion to see that Nk = the number of ways n
l‘s can add up to k, which is the same as the number of different
ways n symbols can form a coded message of length k. Because of
uniqueness, this must be ≤ rk, the number of codewords.
nl q
K
n

r
r
k
k
 nl q  n  1  nl q ( n  1)
k n
But K  1   n  K
n
 nl q ›‹  K  1
Conclusion: WLOG we can use only instantaneous codes.
4.7
Average code length
q
L avg 
pl
i i
where
i 1
p i  probabilit y of symbol s i
l i  s i the length of s i
Our goal is to minimize the average coded length.
If pn > pm then ln ≤ lm. For if pm < pn with lm < ln, then
interchanging the encodings for sm and sn we get
 L avg  p m l m  p n l n  ( p m l n  p n l m )  ( p m  p n )( l m  l n )  0 .
old
>
new
So we can assume that if p1 ≥ … ≥ pq then l1 ≤ … ≤ lq,
because if pi = pi+1 with li > li+1, we can just switch si
and si+1.
4.8
Start with S = {s1, …, sq} the source alphabet. And consider B = {0, 1} as our code
alphabet (binary). First, observe that lq1 = lq, since the code is instantaneous, s<q
cannot be a prefix of sq, so dropping the last symbol from sq (if lq > lq1) won’t hurt.
Huffman algorithm: So, we can combine sq1 and sq into a “combo-symbol” (sq1+sq)
with probability (pq1+pq) and get a code for the reduced alphabet.
For q = 1, assign s1 = ε . For q > 1, let sq-1 = (sq-1+sq) 0 and sq = (sq-1+sq) 1
Example:
0.4
0.2
0.2
0.1
0.1
0.4
0.2
0.2
0.2
0.4
0.4
0.2
0.6
0.4
1
01
000
0010
0011
1
01
000
001
1
00
01
0
1
1.0
N. B. the case for q = 1 does not produce a valid code.
ε
4.8
Huffman is always of shortest average length
Huffman
≥
Assume
p1 ≥ … ≥ pq
 Lavg
Alternative
trying to show
L
Example: p1 = 0.7; p2 = p3 = p4 = 0.1
Compare Lavg = 1.5 to log2 q = 2.
Base Case: For q = 2, no shorter code exists.
We know
l1 ≤ … ≤ lq
0
s1
1
s2
Induction Step: For q > 2 take any instantaneous
code for s1, …, sq with minimal average length.
4.8
Claim that lq1 = lq = lq1, q + 1 because
combined symbol
reduced code
sq1 + sq
0
s1, ………
total height = lq
1
sq1 sq
So its reduced code will always satisfy:
q2
L 

p i l i  ( p q 1  p q )( l q  1)  L  p q 1  p q
i 1
By IH, L′avg ≤ L′. But more importantly the
reduced Huffman code shares the same
properties so it also satisfies the same equation
L′avg + (pq1 + pq) = Lavg, hence Lavg ≤ L.
4.8
Code Extensions
Take p1 = ⅔ and p2 = ⅓ Huffman code gives
s1 = 0 s2 = 1 Lavg = 1
Square the symbol alphabet to get:
S2 : s1,1 = s1s1; s1,2 = s1s2; s2,1 = s2s1; s2,2 = s2s2;
p1,1 = 4⁄9 p1,2 = 2⁄9 p2,1 = 2⁄9
p2,2 = 1⁄9
Apply Huffman to S2:
s1,1 = 1; s1,2 = 01; s2,1 = 000; s2,2 = 001
L avg  1 
2
2
 1  17
 2    3   3  
2
9
9
9
9
9
4
But we are sending two symbols at a time!
4.10
Huffman Codes in radix r
At each stage down, we merge the last (least probable) r states
into 1, reducing the # of states by r  1. Since we end with one
state, we must begin with no. of states  1 mod (r  1) . We pad
out states with probability 0 to get this. Example: r = 4; k = 3
0.22 0.2 0.18 0.15 0.1 0.08 0.05 0.02 0.0
0.0
0.22 0.2 0.18 0.15 0.1 0.08 0.07
0.4 0.22 0.2 0.18
1
2
3
00
01
02
030 031
1.0
1
2
3
00
01
02
03
0
1
2
3

4.11