Huffman codes are of interest to us as
There are 3 paragraphs on this subject in Skiena, pg 639.
Huffman coding is a much used method for lossless data compression.
The Lempel-Ziv method is even more widely used (in gzip, for example). There are interesting comparisons and contrasts to be made between Huffman and Lempel-Ziv.
The basic idea of Huffman coding is to use variable length bit strings to represent characters, with shorter bit strings accorded to the most frequently used characters.
For example, if a text contains (for simplicity) just the 6 letters a,b,c,d,e,f, with frequencies
a:45, b:13, c:12, d:16, e:9, f:5
then we want a very short code for 'a' and will accept a longer code for 'f'.
The frequencies add up to 100. If the text has 200 characters, 90 are a's, 24 are c's, etc.
Note that uniform 3 bit codes could be used. For a 200 character text that would use 600 bits to encode. Can we do better?
How about 'a' gets one bit, 0. Every other character gets a code starting with a 1...
Huffman codes have the
(b)Encoding can be table lookup: table indexed by the characters containing variable length bit strings.
(c)BUILDING the most effective tree is Huffman's algorithm.
__ (100) __ 0/ \1 / \ (86) (14) 0/ \1 0/ / \ / (58) (28) (14) 0/ \1 0/ \1 0/ \1 / \ / \ / \ [a:45][b:13][c:12][d:16][e:9][f:5]
(100) 0/ \1 / \ [a:45] (55) 0/ \1 / \ (25) (30) 0/ \1 0/ \1 / \ / \ [c:12][b:13] (14) [d:16] 0/ \1 / \ [e:9][f:5]
An optimum tree is necessarily a full tree (no internal node has only one child).
struct node { node *left, *right; char c; int f; /* c is used only in the leaves. * In a leaf, f is the frequency with which character c occurs. */ In an interior node, f is the sum of the frequencies of all chars in the subtree of the node. node(c, f) {left = right = NULL, c = cc, f = ff}; // constructs a leaf merge(node *a, node *b) { // c.merge(a,b) makes c the parent of a and b with correct freq. left = a; right = b; /* c = don't care, */ f = a.f + b.f; } } node *Huffman(vector<pair<char, int> > & V)[ PriorityQueue<node*> Q(less_by_frequency); for (int i = 0; i < V.size(); ++i) Q.insert(new Node(V[i].first, V[i].second)); while (not Q.isEmpty()){ Node a, b, c; a = Q.extractMin(); b = Q.extractMin(); c.merge(a,b), Q.insert(c); }The tree built this way is optimum among all prefix-property encodings of the text. Here optimum means that the number of bits used to represent the text is minimal. This minimum is called the
Closest pair of points problem: Given a set of n points in the plane, pi = (xi,yi), find the minimal distance between any two points.
Brute force: try every pair --> O(n2) time.
Divide and conquer algorithm: --> O(n log(n)) time. [See Wikipedia, but note that it's algorithm details and analysis is flawed.]
Daily problem for next Tue:
Design an efficient algorithm to find the closest pair of n points in three dimensional space,
pi = (xi, yi, zi).