## Analyzing Algorithms part 1

#### Andy Novocin

## Daily Problem

The **set cover problem**: given a set of subsets \(S = \{S_1, \ldots, S_m\} \) each in \(U = \{1, 2, \ldots, n \}\) find the smallest subset of subsets \( T \subset S \) such that the union is \( U \).

Find a counter-example to the algorithm: Select the largest subset first, then delete those elements from \( U \). Repeat by selecting the subset containing the next most uncovered elements until everything is covered.

### Can you devise an algorithm which will correctly terminate?

## big-Oh Face

### Algorithms are ideas, right?

### How do I judge them?

### Correctness, Efficiency, Simplicity

## A Dumb Example:

```
for i in range(n):
print i*i + i
print n + n
```

### How long did that take?

#### A function of `n`

`n`

multiplications of numbers \( \leq n \)

`n`

additions of numbers \( \leq n \)

#### one additional addition of size \( n \)

`n+1`

calls to `print`

#### ???

#### \( \sum_{i=1}^n 3\log_{2^{53}}{i} + \log_{2^{53}}{n} + n \cdot \mathcal{C}(\textrm{print}) \)

#### Even a simple bit of code is difficult to analyze fully

### What if there is no specific implementation?

- gcd(\(a,b\)):
- if \( a \leq b \) return gcd(\(b, a\))
- \(r := a \mod b\)
- if \( r \neq 0 \) return gcd(\( b, r \))
- else return \(b\)

## We need a model

## Computational Model: RAM

### Simple operations take 1 step

#### +,-,*,==,if,call

### Memory is unlimited

### Any memory address can be accessed in 1 step

### Recursive calls, Loops, For are not simple and require care

### RAM model allows us to count operations on a theoretical machine

### It misses many small adjustments

### But like Newtownian mechanics it's useful!

## GCD (Euclidean algorithm)

`mod`

and `if`

are simple so cost 1

### repeated 1-3 times per recursive call

### How many recursive calls can there be?

## Best, Average, Worst cases

### Best case for gcd is \(a=x, b=x\) cost is 3

### Average case, well, that depends IDEONE experiment showing 18.1799 for 32 bit integers.

### Worst case is tricky, but Fibonacci bounded (inductive proof)

### Approx 3 operations per digit (proof on board, if demanded)

### In general, which number is most meaningful?

### That's nice but how does the \( \mathcal{O}() \) come into the picture?

#### Orthogonal to the complexity model, but helpful in thinking it out

#### It is a way to compare any functions on large inputs

#### The algorithmic world needs to compare efficiencies

#### RAM lets us model the cost of an algorithm as a function

#### big-Oh is used to compare those functions

## big-Oh in action

#### \( 2n^3 + 2013n^2 + 2 = \mathcal{O}(n^3) \)

#### \( 2n^3 - 2013n^2 + 2 = \mathcal{O}(n^3) \)

#### \( \sum_{i=0}^{4n} 1519 = \mathcal{O}(n) \)

#### \( \sum_{i=0}^{4n} 1519i = \mathcal{O}(n^2) \)

#### \( 12 \cdot 2^{x} \cdot 2^{-x} = \mathcal{O}(1) \)

#### \( 12 \cdot 2^{x} \cdot x^{2} = \mathcal{O}(x^2 \cdot 2^{x}) \)

## Thinking in \(\mathcal{O}\)

### Follow the largest term

### Ignore constant multipliers

### Adding two terms collapses them (be careful)

### Multiplying two terms combines them (be careful)

### Big-Oh, is \(\mathcal{O}()\) many things

#### Computer scientists are lazy (it's easy to work with)

#### Computer scientists are transcendent (future \( \geq \) today)

## Follow the definition

$$ \mathcal{O}(g(x)) = \{ f(x) \textrm{ s.t. } \exists c, n \textrm{ with } f(x) \leq c \cdot g(x) ~\forall ~x \geq n \} $$

#### In words: \( \mathcal{O}(g(x)) \) is the set of all functions which are eventually smaller than a constant multiple of \( g(x) \)

#### \( f(x) = \mathcal{O}(g(x)) \) actually means \( f \in \mathcal{O}(g(x)) \)

## Sample Analysis

#### \( f(n) = 2n^2 - 20n + 4 \)

#### \(f(n) = \mathcal{O}(n^2) \) because \(2n^2 \geq 2n^2 - 20n + 4\)

#### \(f(n) = \mathcal{O}(n^3) \) because \( \frac{1}{100} n^3 \geq f(n) \)

#### \(f(n) \neq \mathcal{O}(n) \) because \( c \cdot n < n^2 \) when \( n > c \)

## The comparison family

- For all large enough \( x \)
- \( f(x) = \omega(g(x)) \) when \( \lim_{x \to \infty} \frac{g(x)}{f(x)} = 0 \)
- \( f(x) = \Omega(g(x)) \) when \( f(x) \geq C \cdot g(x) \)
- \( f(x) = \Theta(g(x)) \) when \( f(x) \geq C_1 \cdot g(x) \) and \( f(x) \leq C_2 \cdot g(x) \)
- \( f(x) = \mathcal{O}(g(x)) \) when \( f(x) \leq C \cdot g(x) \)
- \( f(x) = o(g(x)) \) when \( \lim_{x \to \infty} \frac{f(x)}{g(x)} = 0 \)

### Find your partner, stretch, and answer this:

#### Which of \( \omega, \Omega, \Theta, \mathcal{O}, o \) describe the relationship between \( 2^x \) and \( 3^x \)?

## Put it all together

### Insertion Sort:

```
for (i = 1; i < n; i++){
j = i;
while ((j > 0) && (s[j] < s[j-1])){
swap(s[j], s[j-1]);
j = j-1;
}
}
```

#### Count the worst-case `swap`

s.

## Project Euler Problem 4

A palindromic number reads the same both ways. The largest palindrome made from the product of two 2-digit numbers is 9009 = 91 × 99.

Find the largest palindrome made from the product of two n-digit numbers.

What is the worst-case complexity of the brute-force solution?