Nov 4 : Graph traversals (BFS, DFS)

Directed Graphs
Graph Search

Code discussed in lecture

Code provided by the book and by us

The book provides an implementation of the Adjacency List structure in AdjacencyListGraph.java. It uses linked lists to hold edges and vertices. It also provides some other methods. In particular, it provides degree(v) and toString for the entire graph.

The values in vertices are often names or other identifiers. It can be very helpful to be able to find a particular vertex quickly, given its identifier. (We will use this in implementing the Kevin Bacon game.) Therefore I have modified AdjacencyListGraph.java to use a map instead of a linked list to hold the vertices. The key is the value stored in the element. This only works if no two vertices have the same element value. This code is in AdjacencyListGraphMap.java.

This code supplies the method getVertex(id) which returns a Vertex object given the id for that vertex. This required changes to the instance variable that holds the collection of Vertex objects and to the constructor. It also required changes to insertVertex (must put into the map after verifying that the name is not there already), removeVertex (must remove from the map), and replace on a Vertex object (changing a vertex name means updating the map, also, after verifying that the new name is not there already).

I have also overloaded all of the other methods that take one or more Vertex<V> parameters to take V parameters instead. These are all one-liners - they simply call the original version, passing getVertex(id) where a Vertex<V> parameter is needed.

All of these modifications are at the beginning of the file and are marked by comments at the start and end of the section. Also, the book's net.datastructures code is in net.zip

Go through the main method and demonstrate the use of the graph methods.

Directed Graphs

The book suggests that to deal with directed graphs two methods be added the the Graph interface:

isDirected(e): Test if e is a directed edge.
insertDirectedEdge(v, w, label): Create a directed edge from v to w with value label

I find it useful to add four additional methods:

incidentEdgesIn(w): Return an interable collection of the edges directed into w.
incidentEdgesOut(v): Return an interable collection of the edges directed out of w.
inDegree(v): Return in-degree of v.
outDegree(v): Return the out-degree of v.

How do directed edges change things? For the representations with edge lists edges now just go in one direction. (Unless it is a mixed graph, with both directed and undirected edges.) It is important that the endVertices(e) method returns an array with A[0] as the source and A[1] as the destination, or we would be unable to determine which direction the edge was directed.

You will implement a class DirectedAdjListMap by extending AdjacencyListGraphMap.java. However, this raises a protential problem: if we are extending the undirected graph code, we still have insertEdge, which should insert an undirected edge. So we potentially have a mixed graph. How do we manage to mark that an edge is directed? It would seem that we would have to go into the AdjacencyListGraphMap code to add additional information to the way that the edge is represented.

It turns out that this is a common problem. When searching a graph, how do we determine if an edge or vertex has already been visited? I would be good to have a way to "mark" edges and vertices. A problem that arises in a number of situations is a good candidate for a design pattern, and the design pattern we use to solve this problem is called the Decorator pattern.

The basic idea is that we would like to "decorated" an edge or vertex with some additional information that is relevant to a particular algorithm but does not change the element saved in the position or require changes to the code of the class being used or extended. The way to do this is to add a small map to MyPosition and have it implement the Map interface as well as the Position interface. The book has in interface for this decorated position called a DecorablePosition. Then the class MyPosition in AdjacencyListGraph.java extends HashTableMap and implements DecorablePosition. HashTableMap is the author's implementation of open addressing with linear probing. The default constructor constructs a table of size 3. Why so small? Because we don't expect many "decorations".

How do we use this map to decorate a position? We can see it used in DFS.java, which implements a generic DFS traversal of a graph. They create an object STATUS to use as the key in the map. They also create objects VISITED and UNVISITED to be values associated with the key STATUS. If they needed other decorations they would have created appropriately named keys and values for them as well. They then mark a position p as visited by calling p.put(STATUS, VISITED);, then mark it unvisited by calling p.put(STATUS, UNVISITED);, and they test to see if a position is visited by calling p.get(STATUS) == VISITED. (Note that they could have eliminated UNVISITED by just having get return VISITED or null.)

This is a useful design pattern. The basic idea is to supply a map so that anyone can add any property that they want to the object without changing the object's implementation. You will use it in SA-10.

Graph search

Suppose we want to find a path from one vertex to another in a graph, e.g., a route in a map from a destination to a goal, or a sequence of movies connecting actors to Kevin Bacon. That's the problem of graph search. There are many approaches, depending on the information we have available. When all we have is the graph structure itself (i.e., nothing to help us know or guess how close we are to the goal), we're essentially doing "blind search". In that case, there are two basic deterministic approaches (or we could just search randomly).

Depth-first search is like what you probably do in exploring a maze. You keep going deeper and deeper in the maze, making choices at the various forks in the road, until you hit a dead-end. Then you back up to one of the choice points and make a different choice. [Example in class.] If you're fairy-tale clever, you'll scatter some bread crumbs behind you, so that you can see if you've come back to the same point and are just walking in circles. Then you'd want to do the same thing as if you hit a dead end.

To make this intuitive description a little more precise, we rely on a stack. For DFS we do the following:

Push the start vertex v onto the stack
Repeat until we find the goal vertex or the stack is empty:
  Pop the next vertex v from the stack
  If v has not been visited
    Mark v as visited (and maybe do some processing)
    for all vertices v' that are adjacent to v
      If we haven't already visited v':
        push v' onto the stack

We can keep track which vertices we have visited with a list (or a Set, more abstractly). Or we can decorate the vertex to show that it has been visited, as the book's code does. The book's code also uses recursion instead of the stack. It also adds code to visit edges and process them.

The book also notes that edges can be broken into two groups (for an undirected graph): discovery edges and back edges. They even add another decoration to keep track of edge type. Discovery edges are the first edge to find a vertex. They form a tree, the DFS tree. Back edges go back up the tree to an ancestor, and indicate that a cycle exists in the graph. The book's generic DFS algorithm is a template that allows you to supply methods to be done when a vertex is first visited, when traversing discovery or back edges, when leaving a vertex after visiting it, etc.

For directed graphs it becomes a bit more complicated - there are also forward edges that connect a node to a descendent in the DFS tree and cross edges that connect two nodes, neither of which is the ancestor of the other. The book presents a lot of problems that can be solved by DFS - reachability, strongly connected components, transitive closure, and topological sorting. They don't mention the culmination of "everything that can be done with DFS." That is graph planarity. Bob Tarjan developed a Θ(n + m) time algorithm for determining if a graph is planar using DFS.

This can be a good technique if we just want to find the goal, and don't want to keep too many alternatives around. But for the Kevin Bacon game, we want to find the shortest path from each actor to Kevin Bacon. In that case, instead of searching in a depth-first manner, we search in a breadth-first manner.

Breadth-first search expands uniformly in all "directions", like radiating ripples; the directions here are edges out of a vertex. So it finds all the vertices 1 step away, then all those 2 steps away, etc. Thus we know once we've found the goal, we've got the shortest set of edges from the start vertex to it.

This requires only a small change in our search strategy: rather than using a stack, we use a queue. With a just a few word changes to our algorithm, using a queue instead of a stack, we go from DFS to BFS:

Enqueue the start vertex v onto the queue
Repeat until we find the goal vertex or the queue is empty:
  Dequeue the next vertex v from the queue
  If v has not been visited
    Mark v as visited (and maybe do some processing)
    for all vertices v' that are adjacent to v
      If we haven't already visited v':
        enqueue v' onto the queue

While the above code works, we can optimize it a bit by noting that the first item pushed onto the queue will be the first item off of the queue. Therefore when we first discover a vertex we can mark it when we add it to the queue, knowing that later discoveries of the vertex need not be added to the queue. (This does not work for DFS, because it the most recently added copy of the vertex is the one that comes off of the stack first and thus gets marked and its adjacent vertices pushed onto the stack.) The optimized version is:

Enqueue the start vertex v onto the queue and mark it
Repeat until we find the goal vertex or the queue is empty:
  Dequeue the next vertex v from the queue
    (Maybe do some processing)
    for all vertices v' that are adjacent to v
      If we haven't already visited v':
        mark v' (and maybe do some processing)
        enqueue v' onto the queue

The edge types in BFS are discovery edges or cross edges. No edge goes back to an ancestor, because it would have been visited in the forward direction earlier.

Instead of just keeping track of which vertices we've visited, it's helpful to keep track of how we arrived at each vertex. So instead of a visited list, we build a visited tree, where the parent of a vertex is the vertex from which we discovered it. Note that the parent is unique (since we don't visit a vertex twice), and thus gives a tree. However, we will often use a graph ADT to hold this tree. It gives us a way to do edge labels and to find particular vertices quickly.

We can do a BFS from a start vertex without any goal in mind — just run until we've visited the whole graph (or at least the part of it that is reachable from the start vertex). Then we have a BFS tree that has shortest paths from each reachable vertex to the start vertex. To find such a shortest path, we just find the desired vertex (e.g., an actor) in the tree, and follow edges back to the start vertex (e.g., Kevin Bacon); we can keep track of the edges (movie titles) on the way back. Demo.