Analyzing Algorithms part 1

Andy Novocin

Daily Problem

The set cover problem: given a set of subsets \(S = \{S_1, \ldots, S_m\} \) each in \(U = \{1, 2, \ldots, n \}\) find the smallest subset of subsets \( T \subset S \) such that the union is \( U \).

Find a counter-example to the algorithm: Select the largest subset first, then delete those elements from \( U \). Repeat by selecting the subset containing the next most uncovered elements until everything is covered.

Can you devise an algorithm which will correctly terminate?

big-Oh Face

Algorithms are ideas, right?

How do I judge them?

Correctness, Efficiency, Simplicity

A Dumb Example:

for i in range(n):
  print i*i + i
print n + n

How long did that take?

A function of n

n multiplications of numbers \( \leq n \)

n additions of numbers \( \leq n \)

one additional addition of size \( n \)

n+1 calls to print


\( \sum_{i=1}^n 3\log_{2^{53}}{i} + \log_{2^{53}}{n} + n \cdot \mathcal{C}(\textrm{print}) \)

Even a simple bit of code is difficult to analyze fully

What if there is no specific implementation?

We need a model

Computational Model: RAM

Simple operations take 1 step


Memory is unlimited

Any memory address can be accessed in 1 step

Recursive calls, Loops, For are not simple and require care

RAM model allows us to count operations on a theoretical machine

It misses many small adjustments

But like Newtownian mechanics it's useful!

GCD (Euclidean algorithm)

mod and if are simple so cost 1

repeated 1-3 times per recursive call

How many recursive calls can there be?

Best, Average, Worst cases

Best case for gcd is \(a=x, b=x\) cost is 3

Average case, well, that depends IDEONE experiment showing 18.1799 for 32 bit integers.

Worst case is tricky, but Fibonacci bounded (inductive proof)

Approx 3 operations per digit (proof on board, if demanded)

In general, which number is most meaningful?

That's nice but how does the \( \mathcal{O}() \) come into the picture?

Orthogonal to the complexity model, but helpful in thinking it out

It is a way to compare any functions on large inputs

The algorithmic world needs to compare efficiencies

RAM lets us model the cost of an algorithm as a function

big-Oh is used to compare those functions

big-Oh in action

\( 2n^3 + 2013n^2 + 2 = \mathcal{O}(n^3) \)

\( 2n^3 - 2013n^2 + 2 = \mathcal{O}(n^3) \)

\( \sum_{i=0}^{4n} 1519 = \mathcal{O}(n) \)

\( \sum_{i=0}^{4n} 1519i = \mathcal{O}(n^2) \)

\( 12 \cdot 2^{x} \cdot 2^{-x} = \mathcal{O}(1) \)

\( 12 \cdot 2^{x} \cdot x^{2} = \mathcal{O}(x^2 \cdot 2^{x}) \)

Thinking in \(\mathcal{O}\)

Follow the largest term

Ignore constant multipliers

Adding two terms collapses them (be careful)

Multiplying two terms combines them (be careful)

Big-Oh, is \(\mathcal{O}()\) many things

Computer scientists are lazy (it's easy to work with)

Computer scientists are transcendent (future \( \geq \) today)

Follow the definition

$$ \mathcal{O}(g(x)) = \{ f(x) \textrm{ s.t. } \exists c, n \textrm{ with } f(x) \leq c \cdot g(x) ~\forall ~x \geq n \} $$

In words: \( \mathcal{O}(g(x)) \) is the set of all functions which are eventually smaller than a constant multiple of \( g(x) \)

\( f(x) = \mathcal{O}(g(x)) \) actually means \( f \in \mathcal{O}(g(x)) \)

Sample Analysis

\( f(n) = 2n^2 - 20n + 4 \)

\(f(n) = \mathcal{O}(n^2) \) because \(2n^2 \geq 2n^2 - 20n + 4\)

\(f(n) = \mathcal{O}(n^3) \) because \( \frac{1}{100} n^3 \geq f(n) \)

\(f(n) \neq \mathcal{O}(n) \) because \( c \cdot n < n^2 \) when \( n > c \)

The comparison family

Find your partner, stretch, and answer this:

Which of \( \omega, \Omega, \Theta, \mathcal{O}, o \) describe the relationship between \( 2^x \) and \( 3^x \)?

Put it all together

Insertion Sort:

for (i = 1; i < n; i++){
  j = i;
  while ((j > 0) && (s[j] < s[j-1])){
    swap(s[j], s[j-1]);
    j = j-1;

Count the worst-case swaps.

Project Euler Problem 4

A palindromic number reads the same both ways. The largest palindrome made from the product of two 2-digit numbers is 9009 = 91 × 99.

Find the largest palindrome made from the product of two n-digit numbers.

What is the worst-case complexity of the brute-force solution?

Is there time for more?
If YES then Go To Slide #22
If NO then Turn The Page

You are eaten by crocodile aliens from the future protection society.