CISC 320 Sorting X-treme

Sorting: Heaps, Divide and Conquer

Andy Novocin

Daily Problem

Given \( n \) three dimensional points, e.g. \( (3,2,1), (9,-1,2)\), find the closest two.

What is the complexity of your approach?

Advanced Problem (repeated)

Let \( x \leq 50000 \) be an unknown.

If \( x = 81 \mod{89}\), \(x = 13 \mod{23}\), and \(x = 12 \mod{41}\) determine \(x\).

Can you generalize?

Ask me if you want a google search term hint.

New Advanced Problem

Suppose we are given a caesar-shift encoded message of length \(n\) each character of which is a letter from the 26 letter alphabet.

\( \mbox{CAT} \to \mbox{DBU} \) is a caesar shift by 1.

What is the complexity of a decryption algorithm if you:

1) Know the size of the shift?

2) Don't know the size of the shift?

3) If the coded word used a known-length sequence of shifts? (e.g. shift by 1 then 2 then 3 and repeat: \( \mbox{CAT} \to \mbox{DCW} \))

4) If the coded word used an unknown-length sequence of shifts? (e.g. a cycle of shifts but you don't know the length of the cycle.)

Heaps

(stolen from previous slide set)

Not as good as Balanced Binary Search Trees, but simpler, and ideal for priority queues.

A binary tree with the heap property: Each parent dominates its children.

Also if we pack into an array then we save the pointer space (and lookup times)

Storing method

The \( 2^{l} \) nodes of the \( l^{\textrm{th}} \) row are packed into positions: \( 2^{l - 1}, \cdots, 2^{l} - 1 \)

So the node at position \( x \) has left child at position \( 2x \) and right child at position \( 2x + 1 \)

Reversed this means the parent of node \( x \) is at position \( \lceil x/2 \rceil \).

e.g. Packed in this way the height 3 balanced BST (not a heap) is packed as \( \{4, 2, 6, 1, 3, 5, 7 \} \)

Insert

To maintain a happy heap we must be packed and dominated

Packed happens by inserting to the last spot

For domination, swap with your pop, until things are right.

This is called `bubble_up`

What is the complexity?

Heap insert (also sort) example

Delete-Min/Max

Packed and Dominated

For packing we move the last child into the first spot

Now we have to swap down with the dominant child, until things are right.

This is called `bubble_down`

What is the complexity?

HEAPSORT

Load the heap (for now \( \mathcal{O}(n \log{n} ) \) ).

Call extract-min/max \( n \) times (each \( \mathcal{O}(\log{n}) \))

Cost is \( \mathcal{O}(n \log{n} ) \)

Even faster loading

Needs a board drawing but:

Start at the back of an array and call `bubble-down`

Creates many mini-heaps but their heights are smaller

\( \sum_{h=0}^{\lfloor \log{n} \rfloor} \lceil n / 2^{h+1} \rceil h \leq n \sum h/2^h \leq 2n \)

Heap Problem 1:

Build a heap from \( \{ 5, 3, 17, 10, 84, 19, 6, 22, 9 \} \).

Heap Problem 2:

Where can the minimum element be in a max-heap?

Merge Sort!

AKA my excuse to do cool recursive analysis

The Idea

Split your items into two.

Use the magic of recursion to pretend like they are sorted (the base case is a singleton list).

Until both lists are empty do:

Compare the smallest element of both lists and let that be the next element in your solution.

Merge Sort

As a Germanic dance

Analysis

A merge involves \((n_1 + n_2)\) comparisons

The last merge should have \( n_1 = n_2 = \frac{n}{2}\), so \( n \) comparisons.

Each of those halves will do a merge of size \( \frac{n}{4} \) and \( \frac{n}{4}\), which gives \( 2\cdot (n/4 + n/4) \) comparisons.

At depth \( h \) we have \( 2^h \) chunks each of which cost \( \frac{n}{2^h} \) comparisons.

So at each layer in the recursive tree we have \( n \) comparisons.

Total cost: \( \mathcal{O}(n \log{n} ) \).

Recurrence Relations

Build from the bottom up by using your memory

e.g. \(a_n = a_{n-1} + a_{n-2}, a_1 = 1, a_2 = 1 \) (\(a_i\) are Fibonacci numbers)

\( a_n = n\cdot a_{n-1}, a_1 = 1\), what is \( a_n \)?

Written recursively: \( f(n) = n \cdot f(n-1) \).

In class challenge

Show that the solution of \(T(n) = T(n-1) + n \) is \( \mathcal{O}(n^2)\).

Analysis with a time function

Let \( T(n) \) denote the worst-case time to solve a problem of size \( n\).

Then in mergesort we have \( T(n) = 2 \cdot T(\frac{n}{2}) + \mathcal{O}(n) \).

So to find a formula for \( T(n) \) imagine the base case \(T(1) = 1 \)

Now \(T(2) = 2 + 1\),\(T(2^2) \approx 2(2 + 1) + 2^2 \),

\(T(2^3) \approx 2(2(2+1) + 2^2) + 2^3 \approx 3\cdot 2^3\)

and \( T(2^h) \approx h\cdot 2^h \)

The Master Theorem!

Magically solves divide-and-conquer recurrences for you.

Suppose \(T(n) = a T(n/b) + f(n) \) then:

Too many leaves: if \(f(n) = \mathcal{O}(n^{\log_b{a} - \epsilon})\) then \(T(n) = \Theta(n^{\log_b{a}})\).

Combo matches leaves: if \( f(n) = \Theta(n^{\log_b{a}})\) then \(T(n) = \Theta(n^{\log_b{a}} \log{n})\).

Expensive Combo: if \(f(n) = \Omega(n^{\log_b{a} + \epsilon})\) then \(T(n) = \Theta(f(n))\).

(please note that in the expensive case we must have: \(a f(n/b) \leq c f(n) \) for some \( c\) < \(1\))

Example

Binary Search only uses half of the tree at each step.

So \(f(n) = \mathcal{O}(1)\) and \(T(n) = 1 \cdot T(n/2) + f(n)\).

Thus the combination cost matches the number of leaves to examine \( ( 1 = 1) \)

Conclusion is \(T(n) = \Theta(n^{\log_b{a}} \log{n}) = \Theta(\log{n})\)

Challenge 1

Use the Master Theorem to give tight bounds to the following:

1) \(T(n) = 2T(n/4) + 1\)

2) \(T(n) = 2T(n/4) + \sqrt{n}\)

3) \(T(n) = 2T(n/4) + n\)

4) \(T(n) = 2T(n/4) + n^2\)

Challenge 2

Devise an algorithm for computing the \( k \) smallest elements from a set of \( n \) integers in \( \mathcal{O}(n + k \log{n} ) \) time.