CISC 320 Data Structures and Algorithms

Data Structures and Algorithms wrap-up

Andy Novocin

Best-Day Algorithm(s)

Everyone write down a 1-100 score for their day

Algo 1: Create a group of 3-5 neighbors, pick the highest

Selected person look around and find 3-5 other winners, pick the highest

One last time should do it.

How many comparisons were done?

Worst(Best)-Day Algorithm(s)

Algo 2: Everyone that put less than 50 raise their hand

Less than 25

Repeat until min is found

How do we analyze this?

Average-Day Algorithm(s)

Algo 3: Everyone round your score to the nearest multiple of 25

For value in $ 0, 25, 50, 75, 100 $ we count hands

Do the math.

How much did this cost?

Daily Problem

3-12 Suppose you are given an input set $S$ for $n$ numbers, and a black box that if given any sequence of real numbers and an integer $ k $ instantly and correctly answers whether there is a subset of the input sequence with sum exactly $k$. Show how to use the black box $ \mathcal{O}(n) $ times to find a subset of $ S $ that adds up to $k$.

Sample HW1 Problem solved

2-41 Prove that the binary representation of $ n \geq 1 $ has $ \lfloor \log_{2}{n} \rfloor + 1 $ bits.

Also, if you come to me with a question equivalent to: "Can I get away with less effort?" then I will get preachy!

Scheduler for Programming Groups

Link on Sakai

cs.prof.ninja/programs

Here in slides

More Data Structures

Priority Queue

Another API style data structure

Definition

`Insert(x)` Assume that `x` has a key, `k`

`Find-Min()` and `Find-Max()` return pointer to winner

`Delete-Min()` and `Delete-Max()` remove the highest or lowest.

PQ Usage

How is this different than other DSs?

Whenever you need to work through your options in sensible order.

Scheduling OS tasks, Evaluating job candidates, (Dating)

Heap Preview

Of course, like all API-style structures, can be implemented in many ways.

This is the natural place to use a heap.

A heap is a type of compactified almost search tree

Each node dominates its children so root is the winner

More next week

PQ Challenge

Implement a normal Queue (FIFO) using a Priority Queue

Implement a Stack (LIFO) using a Priority Queue

Hashing!

The three most important algorithms at Yahoo! are hashing, hashing, and hashing.

Udi Manber (while chief scientist at Yahoo!)

As of October 13, 2010, he is also responsible for all of the search products at Google

Key Ideas

Looking up an item in an array by the index is $ \mathcal{O}(1) $

If I only need to work with keys in the set $ \{ 0, 1, \ldots, m \} $ then I can use a length $m+1$ array.

In reality I need to work with many more keys than that.

Can I reliably map a large universe of keys to $ \{ 0, 1, \ldots, m \} $?

Challenge Problem

If all keys are distinct and from the set $ \{ 0, 1, \ldots, m \} $, use a bit-vector to design `insert`, `search`, and `delete` each in $ \mathcal{O}(1)$ time. (Assume no satellite data.)

(Extra Credit: Implement some bit twiddling on your own or play with `std::bitset`)

Hash Functions

To win, we need to map an arbitrary string to an integer $ \leq m $ and $ \geq 0$, deterministically.

I can map any integer to $ 0, \ldots, m $ using $ \mod{m} $ (in code `i % m` )

I can map any character to an integer using `char(c)` (`ord()` in python)

A good hash function might be treating each character like the digit in its representation.

Hash Functions cont.

$$ H(S) = \sum_{i} \alpha^i \cdot \textrm{char}(S[i]) \mod{m} $$

Collisions

Pigeon-hole principle tells me that there must be some values which hash to the same thing.

We deal with this by making a linked list chain out from each value.

Average vs Worst

In the worst case, every single key maps to the same bucket!

In general use each bucket has $ \mathcal{O}(n/m) $ keys with $ n $ input values.

If you design a BAD hashing function then you will see non-uniform distributions.

Avg vs Worst cont.

`Insert` and `Delete` are $ \mathcal{O}(1) $ in the worst-case

`Search` takes $ \mathcal{O}( n /m ) $ on average (practically $ \mathcal{O}(1) $ but $ \mathcal{O}(n) $ in the worst case (overloaded bucket).

The other guys (`Min`, `Successor`, etc) take $ \mathcal{O}(n + m) $

Canonical Data Structures

Array

Linked Lists

Stack

Queue

Priority Queue

Dictionary

Binary Search Tree (Balanced)

Hash Table

What do you choose?

If you were writing a dictionary data structure, what would you pick?

What did Python choose?

Go look at my code!

Domain Specific Structures

Bit Vectors (shifting is cheap)

Compression/Decompression (knowledge is power)

Reversible Hashing (Crypto vs obfuscation)

Sparse Representations (Polys, Mats, Integers)

Understanding Problem 1

You are using the API for a Balanced Binary Search Tree. It is limited to `Min()`, `Successor(p)`, and `Insert(x)`, each of which runs in $ \mathcal{O}(\log{n})$ time. Design a $ \mathcal{O}(n \log{n})$ sorting algorithm.

Understanding Problem 2

Let your hash function be $ h(k) = k \mod{9}$. Using our chained hash table, what is the result after inserting the sequence of keys: $5, 28, 19, 15, 20, 33, 12, 17, 10 $?

Challenge Problem 3

3-5 Find the overhead fraction (the ratio of data space over total space) for each of the following binary tree implementations on $ n $ nodes.

a) All nodes store data, two child pointers, and a parent pointer. The data field requires four bytes, each pointer requires four bytes.

b) Only leaf nodes store data; internal nodes store two child pointers. The data field requires four bytes and each pointer requires four bytes.

Data Structures and Algorithms wrap-up

Andy Novocin

Best-Day Algorithm(s)

Everyone write down a 1-100 score for their day

Algo 1: Create a group of 3-5 neighbors, pick the highest

Selected person look around and find 3-5 other winners, pick the highest

One last time should do it.

How many comparisons were done?

Worst(Best)-Day Algorithm(s)

Algo 2: Everyone that put less than 50 raise their hand

Less than 25

Repeat until min is found

How do we analyze this?

Average-Day Algorithm(s)

Algo 3: Everyone round your score to the nearest multiple of 25

For value in \( 0, 25, 50, 75, 100 \) we count hands

Do the math.

How much did this cost?

Daily Problem

Sample HW1 Problem solved

2-41 Prove that the binary representation of \( n \geq 1 \) has \( \lfloor \log_{2}{n} \rfloor + 1 \) bits.

Also, if you come to me with a question equivalent to: "Can I get away with less effort?" then I will get preachy!

Scheduler for Programming Groups

Link on Sakai

cs.prof.ninja/programs

Here in slides

More Data Structures

Priority Queue

Another API style data structure

Definition

Insert(x) Assume that x has a key, k

Find-Min() and Find-Max() return pointer to winner

Delete-Min() and Delete-Max() remove the highest or lowest.

PQ Usage

How is this different than other DSs?

Whenever you need to work through your options in sensible order.

Scheduling OS tasks, Evaluating job candidates, (Dating)

Heap Preview

Of course, like all API-style structures, can be implemented in many ways.

This is the natural place to use a heap.

A heap is a type of compactified almost search tree

Each node dominates its children so root is the winner

More next week

PQ Challenge

Implement a normal Queue (FIFO) using a Priority Queue

Implement a Stack (LIFO) using a Priority Queue

Hashing!

Key Ideas

Looking up an item in an array by the index is \( \mathcal{O}(1) \)

If I only need to work with keys in the set \( \{ 0, 1, \ldots, m \} \) then I can use a length \(m+1\) array.

In reality I need to work with many more keys than that.

Can I reliably map a large universe of keys to \( \{ 0, 1, \ldots, m \} \)?

Challenge Problem

If all keys are distinct and from the set \( \{ 0, 1, \ldots, m \} \), use a bit-vector to design insert, search, and delete each in \( \mathcal{O}(1)\) time. (Assume no satellite data.)

(Extra Credit: Implement some bit twiddling on your own or play with std::bitset)

Hash Functions

To win, we need to map an arbitrary string to an integer \( \leq m \) and \( \geq 0\), deterministically.

I can map any integer to \( 0, \ldots, m \) using \( \mod{m} \) (in code i % m )

I can map any character to an integer using char(c) (ord() in python)

A good hash function might be treating each character like the digit in its representation.

Hash Functions cont.

Collisions

Pigeon-hole principle tells me that there must be some values which hash to the same thing.

We deal with this by making a linked list chain out from each value.

Average vs Worst

In the worst case, every single key maps to the same bucket!

In general use each bucket has \( \mathcal{O}(n/m) \) keys with \( n \) input values.

If you design a BAD hashing function then you will see non-uniform distributions.

Avg vs Worst cont.

Insert and Delete are \( \mathcal{O}(1) \) in the worst-case

Search takes \( \mathcal{O}( n /m ) \) on average (practically \( \mathcal{O}(1) \) but \( \mathcal{O}(n) \) in the worst case (overloaded bucket).

The other guys (Min, Successor, etc) take \( \mathcal{O}(n + m) \)

Canonical Data Structures

Array

Linked Lists

Stack

Queue

Priority Queue

Dictionary

Binary Search Tree (Balanced)

`Insert(x)` Assume that `x` has a key, `k`

`Find-Min()` and `Find-Max()` return pointer to winner

`Delete-Min()` and `Delete-Max()` remove the highest or lowest.

If all keys are distinct and from the set \( \{ 0, 1, \ldots, m \} \), use a bit-vector to design `insert`, `search`, and `delete` each in \( \mathcal{O}(1)\) time. (Assume no satellite data.)

(Extra Credit: Implement some bit twiddling on your own or play with `std::bitset`)

I can map any integer to \( 0, \ldots, m \) using \( \mod{m} \) (in code `i % m` )

I can map any character to an integer using `char(c)` (`ord()` in python)

`Insert` and `Delete` are \( \mathcal{O}(1) \) in the worst-case

`Search` takes \( \mathcal{O}( n /m ) \) on average (practically \( \mathcal{O}(1) \) but \( \mathcal{O}(n) \) in the worst case (overloaded bucket).

The other guys (`Min`, `Successor`, etc) take \( \mathcal{O}(n + m) \)

You are using the API for a Balanced Binary Search Tree. It is limited to `Min()`, `Successor(p)`, and `Insert(x)`, each of which runs in \( \mathcal{O}(\log{n})\) time. Design a \( \mathcal{O}(n \log{n})\) sorting algorithm.