Analyzing Algorithms part 1
Andy Novocin
Daily Problem
The set cover problem: given a set of subsets \(S = \{S_1, \ldots, S_m\} \) each in \(U = \{1, 2, \ldots, n \}\) find the smallest subset of subsets \( T \subset S \) such that the union is \( U \).
Find a counter-example to the algorithm: Select the largest subset first, then delete those elements from \( U \). Repeat by selecting the subset containing the next most uncovered elements until everything is covered.
Can you devise an algorithm which will correctly terminate?
big-Oh Face
Algorithms are ideas, right?
How do I judge them?
Correctness, Efficiency, Simplicity
A Dumb Example:
for i in range(n):
print i*i + i
print n + n
How long did that take?
A function of n
n
multiplications of numbers \( \leq n \)
n
additions of numbers \( \leq n \)
one additional addition of size \( n \)
n+1
calls to print
???
\( \sum_{i=1}^n 3\log_{2^{53}}{i} + \log_{2^{53}}{n} + n \cdot \mathcal{C}(\textrm{print}) \)
Even a simple bit of code is difficult to analyze fully
What if there is no specific implementation?
- gcd(\(a,b\)):
- if \( a \leq b \) return gcd(\(b, a\))
- \(r := a \mod b\)
- if \( r \neq 0 \) return gcd(\( b, r \))
- else return \(b\)
We need a model
Computational Model: RAM
Simple operations take 1 step
+,-,*,==,if,call
Memory is unlimited
Any memory address can be accessed in 1 step
Recursive calls, Loops, For are not simple and require care
RAM model allows us to count operations on a theoretical machine
It misses many small adjustments
But like Newtownian mechanics it's useful!
GCD (Euclidean algorithm)
mod
and if
are simple so cost 1
repeated 1-3 times per recursive call
How many recursive calls can there be?
Best, Average, Worst cases
Best case for gcd is \(a=x, b=x\) cost is 3
Average case, well, that depends IDEONE experiment showing 18.1799 for 32 bit integers.
Worst case is tricky, but Fibonacci bounded (inductive proof)
Approx 3 operations per digit (proof on board, if demanded)
In general, which number is most meaningful?
That's nice but how does the \( \mathcal{O}() \) come into the picture?
Orthogonal to the complexity model, but helpful in thinking it out
It is a way to compare any functions on large inputs
The algorithmic world needs to compare efficiencies
RAM lets us model the cost of an algorithm as a function
big-Oh is used to compare those functions
big-Oh in action
\( 2n^3 + 2013n^2 + 2 = \mathcal{O}(n^3) \)
\( 2n^3 - 2013n^2 + 2 = \mathcal{O}(n^3) \)
\( \sum_{i=0}^{4n} 1519 = \mathcal{O}(n) \)
\( \sum_{i=0}^{4n} 1519i = \mathcal{O}(n^2) \)
\( 12 \cdot 2^{x} \cdot 2^{-x} = \mathcal{O}(1) \)
\( 12 \cdot 2^{x} \cdot x^{2} = \mathcal{O}(x^2 \cdot 2^{x}) \)
Thinking in \(\mathcal{O}\)
Follow the largest term
Ignore constant multipliers
Adding two terms collapses them (be careful)
Multiplying two terms combines them (be careful)
Big-Oh, is \(\mathcal{O}()\) many things
Computer scientists are lazy (it's easy to work with)
Computer scientists are transcendent (future \( \geq \) today)
Follow the definition
$$ \mathcal{O}(g(x)) = \{ f(x) \textrm{ s.t. } \exists c, n \textrm{ with } f(x) \leq c \cdot g(x) ~\forall ~x \geq n \} $$
In words: \( \mathcal{O}(g(x)) \) is the set of all functions which are eventually smaller than a constant multiple of \( g(x) \)
\( f(x) = \mathcal{O}(g(x)) \) actually means \( f \in \mathcal{O}(g(x)) \)
Sample Analysis
\( f(n) = 2n^2 - 20n + 4 \)
\(f(n) = \mathcal{O}(n^2) \) because \(2n^2 \geq 2n^2 - 20n + 4\)
\(f(n) = \mathcal{O}(n^3) \) because \( \frac{1}{100} n^3 \geq f(n) \)
\(f(n) \neq \mathcal{O}(n) \) because \( c \cdot n < n^2 \) when \( n > c \)
The comparison family
- For all large enough \( x \)
- \( f(x) = \omega(g(x)) \) when \( \lim_{x \to \infty} \frac{g(x)}{f(x)} = 0 \)
- \( f(x) = \Omega(g(x)) \) when \( f(x) \geq C \cdot g(x) \)
- \( f(x) = \Theta(g(x)) \) when \( f(x) \geq C_1 \cdot g(x) \) and \( f(x) \leq C_2 \cdot g(x) \)
- \( f(x) = \mathcal{O}(g(x)) \) when \( f(x) \leq C \cdot g(x) \)
- \( f(x) = o(g(x)) \) when \( \lim_{x \to \infty} \frac{f(x)}{g(x)} = 0 \)
Find your partner, stretch, and answer this:
Which of \( \omega, \Omega, \Theta, \mathcal{O}, o \) describe the relationship between \( 2^x \) and \( 3^x \)?
Put it all together
Insertion Sort:
for (i = 1; i < n; i++){
j = i;
while ((j > 0) && (s[j] < s[j-1])){
swap(s[j], s[j-1]);
j = j-1;
}
}
Count the worst-case swap
s.
Project Euler Problem 4
A palindromic number reads the same both ways. The largest palindrome made from the product of two 2-digit numbers is 9009 = 91 × 99.
Find the largest palindrome made from the product of two n-digit numbers.
What is the worst-case complexity of the brute-force solution?