Sorting: Heaps, Divide and Conquer

Andy Novocin

Daily Problem


Given \( n \) three dimensional points, e.g. \( (3,2,1), (9,-1,2)\), find the closest two.


What is the complexity of your approach?

Advanced Problem (repeated)


Let \( x \leq 50000 \) be an unknown.


If \( x = 81 \mod{89}\), \(x = 13 \mod{23}\), and \(x = 12 \mod{41}\) determine \(x\).


Can you generalize?


Ask me if you want a google search term hint.

New Advanced Problem


Suppose we are given a caesar-shift encoded message of length \(n\) each character of which is a letter from the 26 letter alphabet.


\( \mbox{CAT} \to \mbox{DBU} \) is a caesar shift by 1.


What is the complexity of a decryption algorithm if you:


1) Know the size of the shift?


2) Don't know the size of the shift?


3) If the coded word used a known-length sequence of shifts? (e.g. shift by 1 then 2 then 3 and repeat: \( \mbox{CAT} \to \mbox{DCW} \))


4) If the coded word used an unknown-length sequence of shifts? (e.g. a cycle of shifts but you don't know the length of the cycle.)

Heaps

(stolen from previous slide set)

Not as good as Balanced Binary Search Trees, but simpler, and ideal for priority queues.


A binary tree with the heap property: Each parent dominates its children.


Also if we pack into an array then we save the pointer space (and lookup times)

Storing method


The \( 2^{l} \) nodes of the \( l^{\textrm{th}} \) row are packed into positions: \( 2^{l - 1}, \cdots, 2^{l} - 1 \)


So the node at position \( x \) has left child at position \( 2x \) and right child at position \( 2x + 1 \)


Reversed this means the parent of node \( x \) is at position \( \lceil x/2 \rceil \).


e.g. Packed in this way the height 3 balanced BST (not a heap) is packed as \( \{4, 2, 6, 1, 3, 5, 7 \} \)

Insert


To maintain a happy heap we must be packed and dominated


Packed happens by inserting to the last spot


For domination, swap with your pop, until things are right.


This is called bubble_up


What is the complexity?

Heap insert (also sort) example

Delete-Min/Max


Packed and Dominated


For packing we move the last child into the first spot


Now we have to swap down with the dominant child, until things are right.


This is called bubble_down


What is the complexity?

HEAPSORT


Load the heap (for now \( \mathcal{O}(n \log{n} ) \) ).


Call extract-min/max \( n \) times (each \( \mathcal{O}(\log{n}) \))


Cost is \( \mathcal{O}(n \log{n} ) \)

Even faster loading


Needs a board drawing but:


Start at the back of an array and call bubble-down


Creates many mini-heaps but their heights are smaller


\( \sum_{h=0}^{\lfloor \log{n} \rfloor} \lceil n / 2^{h+1} \rceil h \leq n \sum h/2^h \leq 2n \)

Heap Problem 1:


Build a heap from \( \{ 5, 3, 17, 10, 84, 19, 6, 22, 9 \} \).


Heap Problem 2:


Where can the minimum element be in a max-heap?

Merge Sort!


AKA my excuse to do cool recursive analysis

The Idea


Split your items into two.


Use the magic of recursion to pretend like they are sorted (the base case is a singleton list).


Until both lists are empty do:


Compare the smallest element of both lists and let that be the next element in your solution.

Merge Sort

As a Germanic dance


Analysis


A merge involves \((n_1 + n_2)\) comparisons


The last merge should have \( n_1 = n_2 = \frac{n}{2}\), so \( n \) comparisons.


Each of those halves will do a merge of size \( \frac{n}{4} \) and \( \frac{n}{4}\), which gives \( 2\cdot (n/4 + n/4) \) comparisons.


At depth \( h \) we have \( 2^h \) chunks each of which cost \( \frac{n}{2^h} \) comparisons.


So at each layer in the recursive tree we have \( n \) comparisons.


Total cost: \( \mathcal{O}(n \log{n} ) \).

Recurrence Relations


Build from the bottom up by using your memory


e.g. \(a_n = a_{n-1} + a_{n-2}, a_1 = 1, a_2 = 1 \) (\(a_i\) are Fibonacci numbers)


\( a_n = n\cdot a_{n-1}, a_1 = 1\), what is \( a_n \)?


Written recursively: \( f(n) = n \cdot f(n-1) \).

In class challenge


Show that the solution of \(T(n) = T(n-1) + n \) is \( \mathcal{O}(n^2)\).

Analysis with a time function


Let \( T(n) \) denote the worst-case time to solve a problem of size \( n\).


Then in mergesort we have \( T(n) = 2 \cdot T(\frac{n}{2}) + \mathcal{O}(n) \).


So to find a formula for \( T(n) \) imagine the base case \(T(1) = 1 \)


Now \(T(2) = 2 + 1\),\(T(2^2) \approx 2(2 + 1) + 2^2 \),

\(T(2^3) \approx 2(2(2+1) + 2^2) + 2^3 \approx 3\cdot 2^3\)


and \( T(2^h) \approx h\cdot 2^h \)

The Master Theorem!


Magically solves divide-and-conquer recurrences for you.


Suppose \(T(n) = a T(n/b) + f(n) \) then:


Too many leaves: if \(f(n) = \mathcal{O}(n^{\log_b{a} - \epsilon})\) then \(T(n) = \Theta(n^{\log_b{a}})\).


Combo matches leaves: if \( f(n) = \Theta(n^{\log_b{a}})\) then \(T(n) = \Theta(n^{\log_b{a}} \log{n})\).


Expensive Combo: if \(f(n) = \Omega(n^{\log_b{a} + \epsilon})\) then \(T(n) = \Theta(f(n))\).


(please note that in the expensive case we must have: \(a f(n/b) \leq c f(n) \) for some \( c\) < \(1\))

Example


Binary Search only uses half of the tree at each step.


So \(f(n) = \mathcal{O}(1)\) and \(T(n) = 1 \cdot T(n/2) + f(n)\).


Thus the combination cost matches the number of leaves to examine \( ( 1 = 1) \)


Conclusion is \(T(n) = \Theta(n^{\log_b{a}} \log{n}) = \Theta(\log{n})\)

Challenge 1


Use the Master Theorem to give tight bounds to the following:


1) \(T(n) = 2T(n/4) + 1\)


2) \(T(n) = 2T(n/4) + \sqrt{n}\)


3) \(T(n) = 2T(n/4) + n\)


4) \(T(n) = 2T(n/4) + n^2\)

Challenge 2


Devise an algorithm for computing the \( k \) smallest elements from a set of \( n \) integers in \( \mathcal{O}(n + k \log{n} ) \) time.