## Sorting: Heaps, Divide and Conquer

#### Andy Novocin

## Daily Problem

### Given \( n \) three dimensional points, e.g. \( (3,2,1), (9,-1,2)\), find the closest two.

### What is the complexity of your approach?

## Advanced Problem (repeated)

### Let \( x \leq 50000 \) be an unknown.

### If \( x = 81 \mod{89}\), \(x = 13 \mod{23}\), and \(x = 12 \mod{41}\) determine \(x\).

### Can you generalize?

#### Ask me if you want a google search term hint.

## New Advanced Problem

### Suppose we are given a caesar-shift encoded message of length \(n\) each character of which is a letter from the 26 letter alphabet.

#### \( \mbox{CAT} \to \mbox{DBU} \) is a caesar shift by 1.

### What is the complexity of a decryption algorithm if you:

#### 1) Know the size of the shift?

#### 2) Don't know the size of the shift?

#### 3) If the coded word used a known-length sequence of shifts? (e.g. shift by 1 then 2 then 3 and repeat: \( \mbox{CAT} \to \mbox{DCW} \))

#### 4) If the coded word used an unknown-length sequence of shifts? (e.g. a cycle of shifts but you don't know the length of the cycle.)

## Heaps

##### (stolen from previous slide set)

### Not as good as Balanced Binary Search Trees, but simpler, and ideal for priority queues.

### A binary tree with the heap property: Each parent dominates its children.

### Also if we pack into an array then we save the pointer space (and lookup times)

## Storing method

### The \( 2^{l} \) nodes of the \( l^{\textrm{th}} \) row are packed into positions: \( 2^{l - 1}, \cdots, 2^{l} - 1 \)

### So the node at position \( x \) has left child at position \( 2x \) and right child at position \( 2x + 1 \)

### Reversed this means the parent of node \( x \) is at position \( \lceil x/2 \rceil \).

#### e.g. Packed in this way the height 3 balanced BST (not a heap) is packed as \( \{4, 2, 6, 1, 3, 5, 7 \} \)

## Insert

### To maintain a happy heap we must be packed and dominated

### Packed happens by inserting to the last spot

### For domination, swap with your pop, until things are right.

#### This is called `bubble_up`

### What is the complexity?

## Heap insert (also sort) example

## Delete-Min/Max

### Packed and Dominated

#### For packing we move the last child into the first spot

#### Now we have to swap down with the dominant child, until things are right.

#### This is called `bubble_down`

### What is the complexity?

## HEAPSORT

### Load the heap (for now \( \mathcal{O}(n \log{n} ) \) ).

### Call extract-min/max \( n \) times (each \( \mathcal{O}(\log{n}) \))

#### Cost is \( \mathcal{O}(n \log{n} ) \)

## Even faster loading

### Needs a board drawing but:

### Start at the back of an array and call `bubble-down`

### Creates many mini-heaps but their heights are smaller

#### \( \sum_{h=0}^{\lfloor \log{n} \rfloor} \lceil n / 2^{h+1} \rceil h \leq n \sum h/2^h \leq 2n \)

## Heap Problem 1:

### Build a heap from \( \{ 5, 3, 17, 10, 84, 19, 6, 22, 9 \} \).

## Heap Problem 2:

### Where can the minimum element be in a max-heap?

## Merge Sort!

### AKA my excuse to do cool recursive analysis

## The Idea

### Split your items into two.

### Use the magic of recursion to pretend like they are sorted (the base case is a singleton list).

### Until both lists are empty do:

#### Compare the smallest element of both lists and let that be the next element in your solution.

## Analysis

### A merge involves \((n_1 + n_2)\) comparisons

### The last merge should have \( n_1 = n_2 = \frac{n}{2}\), so \( n \) comparisons.

### Each of those halves will do a merge of size \( \frac{n}{4} \) and \( \frac{n}{4}\), which gives \( 2\cdot (n/4 + n/4) \) comparisons.

### At depth \( h \) we have \( 2^h \) chunks each of which cost \( \frac{n}{2^h} \) comparisons.

### So at each layer in the recursive tree we have \( n \) comparisons.

#### Total cost: \( \mathcal{O}(n \log{n} ) \).

## Recurrence Relations

### Build from the bottom up by using your memory

### e.g. \(a_n = a_{n-1} + a_{n-2}, a_1 = 1, a_2 = 1 \) (\(a_i\) are Fibonacci numbers)

### \( a_n = n\cdot a_{n-1}, a_1 = 1\), what is \( a_n \)?

### Written recursively: \( f(n) = n \cdot f(n-1) \).

## In class challenge

### Show that the solution of \(T(n) = T(n-1) + n \) is \( \mathcal{O}(n^2)\).

## Analysis with a time function

### Let \( T(n) \) denote the worst-case time to solve a problem of size \( n\).

### Then in mergesort we have \( T(n) = 2 \cdot T(\frac{n}{2}) + \mathcal{O}(n) \).

### So to find a formula for \( T(n) \) imagine the base case \(T(1) = 1 \)

#### Now \(T(2) = 2 + 1\),\(T(2^2) \approx 2(2 + 1) + 2^2 \),

#### \(T(2^3) \approx 2(2(2+1) + 2^2) + 2^3 \approx 3\cdot 2^3\)

#### and \( T(2^h) \approx h\cdot 2^h \)

## The Master Theorem!

#### Magically solves divide-and-conquer recurrences for you.

### Suppose \(T(n) = a T(n/b) + f(n) \) then:

#### Too many leaves: if \(f(n) = \mathcal{O}(n^{\log_b{a} - \epsilon})\) then \(T(n) = \Theta(n^{\log_b{a}})\).

#### Combo matches leaves: if \( f(n) = \Theta(n^{\log_b{a}})\) then \(T(n) = \Theta(n^{\log_b{a}} \log{n})\).

#### Expensive Combo: if \(f(n) = \Omega(n^{\log_b{a} + \epsilon})\) then \(T(n) = \Theta(f(n))\).

##### (please note that in the expensive case we must have: \(a f(n/b) \leq c f(n) \) for some \( c\) < \(1\))

## Example

### Binary Search only uses half of the tree at each step.

### So \(f(n) = \mathcal{O}(1)\) and \(T(n) = 1 \cdot T(n/2) + f(n)\).

### Thus the combination cost matches the number of leaves to examine \( ( 1 = 1) \)

### Conclusion is \(T(n) = \Theta(n^{\log_b{a}} \log{n}) = \Theta(\log{n})\)

## Challenge 1

### Use the Master Theorem to give tight bounds to the following:

#### 1) \(T(n) = 2T(n/4) + 1\)

#### 2) \(T(n) = 2T(n/4) + \sqrt{n}\)

#### 3) \(T(n) = 2T(n/4) + n\)

#### 4) \(T(n) = 2T(n/4) + n^2\)

## Challenge 2

### Devise an algorithm for computing the \( k \) smallest elements from a set of \( n \) integers in \( \mathcal{O}(n + k \log{n} ) \) time.