Sorting: Heaps, Divide and Conquer
Andy Novocin
Daily Problem
Given \( n \) three dimensional points, e.g. \( (3,2,1), (9,-1,2)\), find the closest two.
What is the complexity of your approach?
Advanced Problem (repeated)
Let \( x \leq 50000 \) be an unknown.
If \( x = 81 \mod{89}\), \(x = 13 \mod{23}\), and \(x = 12 \mod{41}\) determine \(x\).
Can you generalize?
Ask me if you want a google search term hint.
New Advanced Problem
Suppose we are given a caesar-shift encoded message of length \(n\) each character of which is a letter from the 26 letter alphabet.
\( \mbox{CAT} \to \mbox{DBU} \) is a caesar shift by 1.
What is the complexity of a decryption algorithm if you:
1) Know the size of the shift?
2) Don't know the size of the shift?
3) If the coded word used a known-length sequence of shifts? (e.g. shift by 1 then 2 then 3 and repeat: \( \mbox{CAT} \to \mbox{DCW} \))
4) If the coded word used an unknown-length sequence of shifts? (e.g. a cycle of shifts but you don't know the length of the cycle.)
Heaps
(stolen from previous slide set)
Not as good as Balanced Binary Search Trees, but simpler, and ideal for priority queues.
A binary tree with the heap property: Each parent dominates its children.
Also if we pack into an array then we save the pointer space (and lookup times)
Storing method
The \( 2^{l} \) nodes of the \( l^{\textrm{th}} \) row are packed into positions: \( 2^{l - 1}, \cdots, 2^{l} - 1 \)
So the node at position \( x \) has left child at position \( 2x \) and right child at position \( 2x + 1 \)
Reversed this means the parent of node \( x \) is at position \( \lceil x/2 \rceil \).
e.g. Packed in this way the height 3 balanced BST (not a heap) is packed as \( \{4, 2, 6, 1, 3, 5, 7 \} \)
Insert
To maintain a happy heap we must be packed and dominated
Packed happens by inserting to the last spot
For domination, swap with your pop, until things are right.
This is called bubble_up
What is the complexity?
Heap insert (also sort) example
Delete-Min/Max
Packed and Dominated
For packing we move the last child into the first spot
Now we have to swap down with the dominant child, until things are right.
This is called bubble_down
What is the complexity?
HEAPSORT
Load the heap (for now \( \mathcal{O}(n \log{n} ) \) ).
Call extract-min/max \( n \) times (each \( \mathcal{O}(\log{n}) \))
Cost is \( \mathcal{O}(n \log{n} ) \)
Even faster loading
Needs a board drawing but:
Start at the back of an array and call bubble-down
Creates many mini-heaps but their heights are smaller
\( \sum_{h=0}^{\lfloor \log{n} \rfloor} \lceil n / 2^{h+1} \rceil h \leq n \sum h/2^h \leq 2n \)
Heap Problem 1:
Build a heap from \( \{ 5, 3, 17, 10, 84, 19, 6, 22, 9 \} \).
Heap Problem 2:
Where can the minimum element be in a max-heap?
Merge Sort!
AKA my excuse to do cool recursive analysis
The Idea
Split your items into two.
Use the magic of recursion to pretend like they are sorted (the base case is a singleton list).
Until both lists are empty do:
Compare the smallest element of both lists and let that be the next element in your solution.
Analysis
A merge involves \((n_1 + n_2)\) comparisons
The last merge should have \( n_1 = n_2 = \frac{n}{2}\), so \( n \) comparisons.
Each of those halves will do a merge of size \( \frac{n}{4} \) and \( \frac{n}{4}\), which gives \( 2\cdot (n/4 + n/4) \) comparisons.
At depth \( h \) we have \( 2^h \) chunks each of which cost \( \frac{n}{2^h} \) comparisons.
So at each layer in the recursive tree we have \( n \) comparisons.
Total cost: \( \mathcal{O}(n \log{n} ) \).
Recurrence Relations
Build from the bottom up by using your memory
e.g. \(a_n = a_{n-1} + a_{n-2}, a_1 = 1, a_2 = 1 \) (\(a_i\) are Fibonacci numbers)
\( a_n = n\cdot a_{n-1}, a_1 = 1\), what is \( a_n \)?
Written recursively: \( f(n) = n \cdot f(n-1) \).
In class challenge
Show that the solution of \(T(n) = T(n-1) + n \) is \( \mathcal{O}(n^2)\).
Analysis with a time function
Let \( T(n) \) denote the worst-case time to solve a problem of size \( n\).
Then in mergesort we have \( T(n) = 2 \cdot T(\frac{n}{2}) + \mathcal{O}(n) \).
So to find a formula for \( T(n) \) imagine the base case \(T(1) = 1 \)
Now \(T(2) = 2 + 1\),\(T(2^2) \approx 2(2 + 1) + 2^2 \),
\(T(2^3) \approx 2(2(2+1) + 2^2) + 2^3 \approx 3\cdot 2^3\)
and \( T(2^h) \approx h\cdot 2^h \)
The Master Theorem!
Magically solves divide-and-conquer recurrences for you.
Suppose \(T(n) = a T(n/b) + f(n) \) then:
Too many leaves: if \(f(n) = \mathcal{O}(n^{\log_b{a} - \epsilon})\) then \(T(n) = \Theta(n^{\log_b{a}})\).
Combo matches leaves: if \( f(n) = \Theta(n^{\log_b{a}})\) then \(T(n) = \Theta(n^{\log_b{a}} \log{n})\).
Expensive Combo: if \(f(n) = \Omega(n^{\log_b{a} + \epsilon})\) then \(T(n) = \Theta(f(n))\).
(please note that in the expensive case we must have: \(a f(n/b) \leq c f(n) \) for some \( c\) < \(1\))
Example
Binary Search only uses half of the tree at each step.
So \(f(n) = \mathcal{O}(1)\) and \(T(n) = 1 \cdot T(n/2) + f(n)\).
Thus the combination cost matches the number of leaves to examine \( ( 1 = 1) \)
Conclusion is \(T(n) = \Theta(n^{\log_b{a}} \log{n}) = \Theta(\log{n})\)
Challenge 1
Use the Master Theorem to give tight bounds to the following:
1) \(T(n) = 2T(n/4) + 1\)
2) \(T(n) = 2T(n/4) + \sqrt{n}\)
3) \(T(n) = 2T(n/4) + n\)
4) \(T(n) = 2T(n/4) + n^2\)
Challenge 2
Devise an algorithm for computing the \( k \) smallest elements from a set of \( n \) integers in \( \mathcal{O}(n + k \log{n} ) \) time.