Sorting and friends

Andy Novocin

Daily Problem


Design a data structure to support the following operations:


insert(x, T) Insert x into the set T


delete(k, T) Delete the \( k^{\textrm{th}} \) smallest element from T


member(x, T) Detect if x is in T


All operations must take \( \mathcal{O}(\log{n})\).

Advanced Problem


Given \(n\) and a prime \( p\) return all integers \( i \geq 0 \) such that \( \binom {n} {i} \neq 0 \mod{p} \).


Can you do it in \( \mathcal{O}(t + \log{n}) \) time where \( t \) is the number of such \( i \)?


Sample output, \( n = 5, p = 2 \) result \( [ 0, 1, 4, 5 ] \).

New HW posted


cs.prof.ninja/hw/2


Due on September 25th (a little shorter)

First test soon


Covering Chapters 1-4 on September 30th.


This is important to your grade.

BIG picture revisited


Skills I want you to have


1) Analysis of solutions to problems


2) Mastery of a large set of useful techniques


3) The ability to apply those techniques to new problems


4) Awareness of a large set of difficult problems


5) The ability to prove that a new problem is difficult


6) Methods for attacking the difficult problems anyhow

Sorting X-treme


Addressing the groans:


Calculus vs Real Analysis vs Measure and Integration


A very efficient tool for your toolbox


Sandbox to introduce important concepts


Part of the canon

Count Comparisons


Design a tournament for exactly 5 teams.


You need to rank all 5 spots (not just champion).


Fewest number of games gets a prize.

Sample on 3

For large values?


Each comparison adds some information


Can be lucky can be unlucky


Binary Tree?

Lower Bounds


So every possible permutation must be a leaf.


Because every possible permutation is an input.


The shortest possible tree has \( 2^h \) leaves.


So \( n! \leq l \leq 2^h \) thus \( h \geq \sum_{i=1}^n \log{i} \)


\( h = \Omega(n\log{n})\)

Calculus reasoning


\( \sum_{i=2}^n \log{i} \geq \int_1^{n} \log{x} dx \)

Sort Roster


Quicksort expected \( \mathcal{O}(n \log{n} ) \)


Mergesort \( \mathcal{O}(n \log{n} ) \)


Heapsort \( \mathcal{O}(n \log{n} ) \)


Insertion Sort \( \mathcal{O}(n^2) \)

Broken with pasta?


Spaghetti Sort


1) Cut spaghetti to the length of \( n_i \)


2) Grab spaghetti and rest it vertically on a table


3) Lower your hand until it touches a noodle


4) Select that noodle first and repeat.

Challenge problems


4-29 Mr. B. C. Dull claims to have developed a new data structure for priority queues that supports Insert, Maximum, and Extract-Max all in \( \mathcal{O}(1) \) time. Prove that he is mistaken.


What is the smallest possible depth of a leaf in a decision tree for a comparison sort?

Priority Queue Search


Selection Sort is n applications of:


Find-Minimum


Delete-Minimum


We had it as \( \mathcal{O}(n^2) \) on an array.


A Priority Queue implemented with Balanced Binary Search Tree yields: \( \mathcal{O}(n \log{n} ) \)


Data structure matters!

Heaps


Not as good as Balanced Binary Search Trees, but simpler, and ideal for priority queues.


A binary tree with the heap property: Each parent dominates its children.


Also if we pack into an array then we save the pointer space (and lookup times)

Storing method


The \( 2^{l} \) nodes of the \( l^{\textrm{th}} \) row are packed into positions: \( 2^{l - 1}, \cdots, 2^{l} - 1 \)


So the node at position \( x \) has left child at position \( 2x \) and right child at position \( 2x + 1 \)


Reversed this means the parent of node \( x \) is at position \( \lceil x/2 \rceil \).


e.g. Packed in this way the height 3 balanced BST (not a heap) is packed as \( \{4, 2, 6, 1, 3, 5, 7 \} \)

Insert


To maintain a happy heap we must be packed and dominated


Packed happens by inserting to the last spot


For domination, swap with your pop, until things are right.


This is called bubble_up


What is the complexity?

Heap insert (also sort) example

Delete-Min/Max


Packed and Dominated


For packing we move the last child into the first spot


Now we have to swap down with the dominant child, until things are right.


This is called bubble_down


What is the complexity?

HEAPSORT


Load the heap (for now \( \mathcal{O}(n \log{n} ) \) ).


Call extract-min/max \( n \) times (each \( \mathcal{O}(\log{n}) \))


Cost is \( \mathcal{O}(n \log{n} ) \)

Even faster loading


Needs a board drawing but:


Start at the back of an array and call bubble-down


Creates many mini-heaps but their heights are smaller


\( \sum_{h=0}^{\lfloor \log{n} \rfloor} \lceil n / 2^{h+1} \rceil h \leq n \sum h/2^h \leq 2n \)

Challenge 1


Devise an algorithm for computing the \( k \) smallest elements from a set of \( n \) integers in \( \mathcal{O}(n + k \log{n} ) \) time.

Challenge 2


Where can the minimum element be in a max-heap?

Challenge 3


Build a heap from \( \{ 5, 3, 17, 10, 84, 19, 6, 22, 9 \} \).