## Web crawling / Graph traversal

#### Andy Novocin

## Daily Challenge

### Given an undirected graph give an algorithm to convert from:

### a) Adjacency Matrix to Adjacency List

## Daily Challenge cont.

### b) From Adjacency List to Incidence matrix. An Incidence matrix is a \( |V| \times |E| \) matrix where \(M[i,j] = 1\) if vertex \(i\) is part of edge \(j\) (for some edge numbering system).

## Daily Challenge cont.

### c) From incidence matrix to adjacency list.

## Advanced Challenge

### Programming assignment 3 is up and pertinent.

## Pedagogical Reminder

### The only way to progress in a STEM subject is through wrestling with concepts.

### You can fake it for a long time thanks to the internet... slow down and build skill.

## Graph Traversal

### We want to visit every node (every edge).

### How do we avoid returning to old places?

### How do we do this efficiently?

## Here's Johnny!

## Breadcrumbs

### Three states:

#### Undiscovered (starting state)

#### Discovered (seen once from somewhere)

#### Processed (completely drained of data)

## Properties

### A winning algorithm here must be:

#### Systematic

#### Efficient

#### What's wrong with right hand on the wall?

## Today BFS next time DFS

### BFS = Breadth-First-Search

### DFS = Depth-First-Search

#### Same outcome, completely different applications

## Pseudo-code

```
BFS(G,s):
for v in G.V:
v.state = "undiscovered"
v.parent = NULL
s.state = "discovered"
s.d = 0
Q.enqueue(s)
while not Q.isEmpty:
u = Q.dequeue()
//TODO: process u here
for v in u.neighbors:
//TODO: process edge (u,v) here
if v.state == "undiscovered":
v.state = "discovered"
v.parent = u
v.d = u.d + 1
Q.enqueue(v)
u.state = "processed"
```

#### ICU1: BFS(G,a) vs BFS(G,i);

## ICU2

### How many times is each edge visited?

### What is the complexity of BFS?

## As a vehicle for riding the rails

### Many algorithms are just traversal algorithms with extra processing.

### When you discover a vertex feel free to call a vertex sub-routine.

### When you know a vertex and a parent then call an edge sub-routine.

### When you are done with a vertex you can call a clean-up function.

## Subtle Points (parent)

### Parent is not well defined on a graph.

### Our traversal creates a parent to children relationship

### We've converted the graph into a tree.

#### ICU3: Tree from a; Tree from i

## Subtle Points (queue)

### If we merely change our data container from Queue to Stack we get Depth-First-Search.

### We rely on the ordering of FIFO to get some unique properties

## Subtle Points (bit arrays, properties, or other)

### If you choose the wrong way of "coloring" the points you lose speed.

## BFS tree == Shortest Path?

### Work from leaf to root using `parent`

pointers.

### This gives a backwards path.

### I claim it is the shortest path between the two.

## Reason: BFS works radially

### I will process all of the vertices of distance 1.

### The nodes which are discovered after them must not have been distance 1.

### If they are one away from distance 1 then they must be distance 2.

## Limitations

### Only works when you start BFS from one of the two points.

### So far, only works when unweighted.

#### Quick exercise: Use our tree to get shortest_path from \(a\) to \(f\).

## Connected Components

### Can I get from every vertex to every other vertex?

### Visually, this seems simple (undirected).

### Important question (power grids, emergency traffic redirect, google maps)

## Google Maps

## Doable with BFS

```
c = 0
for v in vertices:
if v.state == "undiscovered":
c += 1
BFS(G,v) // component c
```

## Strong/Weak Connections

## ICU4 Can this be two-colored?

## Bipartite Graph Detection

```
for v in vertices:
v.color = "none"
s = vertices[0]
s.color = "color1"
bipartite = True
BFS(G,s) //using extra sub-routines
def process_edge(u,v):
if (u.color == v.color):
bipartite = False
v.color = opposite_color(u.color)
```