Nov 4 : Graph traversals (BFS, DFS)
Code discussed in lecture
Code provided by the book and by us
The book provides an implementation of the Adjacency List structure in
AdjacencyListGraph.java
. It uses
linked lists to hold edges and vertices. It also provides some other methods. In particular,
it provides degree(v)
and toString
for the entire graph.
The values in vertices are often names or other identifiers. It can be very helpful to be
able to find a particular vertex quickly, given its identifier. (We will use this in
implementing the Kevin Bacon game.) Therefore I have modified AdjacencyListGraph.java
to use a map instead of a linked list to hold the vertices. The key is the value stored in the
element. This only works if no two vertices have the same element value. This code is in
AdjacencyListGraphMap.java
.
This code supplies the method getVertex(id)
which returns a Vertex
object given the id for that vertex. This required changes to the instance variable that holds the
collection of Vertex
objects and to the constructor. It also required changes to
insertVertex
(must put
into the map after verifying that the name is not there already), removeVertex
(must remove
from the map), and replace
on a Vertex object (changing a vertex name means updating the map, also,
after verifying that the new name is not there already).
I have also overloaded all of the other methods that take one or more Vertex<V>
parameters
to take V
parameters instead. These are all one-liners - they simply call the original
version, passing getVertex(id)
where a Vertex<V>
parameter is needed.
All of these modifications are at the beginning of the file and are marked by comments at the start
and end of the section. Also, the book's net.datastructures
code is in
net.zip
Go through the main
method and demonstrate the use of the graph methods.
Directed Graphs
The book suggests that to deal with directed graphs two methods be added the the Graph interface:
isDirected(e)
: Test if e is a directed edge.insertDirectedEdge(v, w, label)
: Create a directed edge from v to w with value label
I find it useful to add four additional methods:
incidentEdgesIn(w)
: Return an interable collection of the edges directed into w.incidentEdgesOut(v)
: Return an interable collection of the edges directed out of w.inDegree(v)
: Return in-degree of v.outDegree(v)
: Return the out-degree of v.
How do directed edges change things? For the representations with edge lists edges now just go
in one direction. (Unless it is a mixed graph, with both directed and undirected edges.) It is important
that the endVertices(e)
method returns an array with A[0]
as the source and
A[1]
as the destination, or we would be unable to determine which direction the edge was
directed.
You will implement a class DirectedAdjListMap
by extending
AdjacencyListGraphMap.java
. However,
this raises a protential problem: if we are extending the undirected graph code, we still have
insertEdge
, which should insert an undirected edge. So we potentially have a mixed
graph. How do we manage to
mark that an edge is directed? It would seem that we would have to go into the
AdjacencyListGraphMap
code to add additional information to the way that the edge is
represented.
It turns out that this is a common problem. When searching a graph, how do we determine if an edge or vertex has already been visited? I would be good to have a way to "mark" edges and vertices. A problem that arises in a number of situations is a good candidate for a design pattern, and the design pattern we use to solve this problem is called the Decorator pattern.
The basic idea is that we would like to "decorated" an edge or vertex with some additional
information that is relevant to a particular algorithm but does not change the element saved
in the position or require
changes to the code of the class being used or extended. The way to do this is to add a small map
to MyPosition
and have it implement the Map
interface as well as the
Position
interface. The book has in interface for this decorated position called a
DecorablePosition
.
Then the class MyPosition
in AdjacencyListGraph.java
extends HashTableMap
and implements DecorablePosition
. HashTableMap
is the author's implementation
of open addressing with linear probing. The default constructor constructs a table of size 3. Why so
small? Because we don't expect many "decorations".
How do we use this map to decorate a position? We can see it used in
DFS.java
, which
implements a generic DFS traversal of a graph. They create an object STATUS
to use as the
key in the map. They also create objects VISITED
and UNVISITED
to be values
associated with the key STATUS
. If they needed other decorations they would have created
appropriately named keys and values for them as well. They then mark a position p as visited by
calling p.put(STATUS, VISITED);
, then mark it unvisited by calling
p.put(STATUS, UNVISITED);
, and they test to see if a position is visited by calling
p.get(STATUS) == VISITED
. (Note that they could have eliminated UNVISITED
by just having get
return VISITED
or null
.)
This is a useful design pattern. The basic idea is to supply a map so that anyone can add any property that they want to the object without changing the object's implementation. You will use it in SA-10.
Graph search
Suppose we want to find a path from one vertex to another in a graph, e.g., a route in a map from a destination to a goal, or a sequence of movies connecting actors to Kevin Bacon. That's the problem of graph search. There are many approaches, depending on the information we have available. When all we have is the graph structure itself (i.e., nothing to help us know or guess how close we are to the goal), we're essentially doing "blind search". In that case, there are two basic deterministic approaches (or we could just search randomly).
Depth-first search is like what you probably do in exploring a maze. You keep going deeper and deeper in the maze, making choices at the various forks in the road, until you hit a dead-end. Then you back up to one of the choice points and make a different choice. [Example in class.] If you're fairy-tale clever, you'll scatter some bread crumbs behind you, so that you can see if you've come back to the same point and are just walking in circles. Then you'd want to do the same thing as if you hit a dead end.
To make this intuitive description a little more precise, we rely on a stack. For DFS we do the following:
Push the start vertex v onto the stack
Repeat until we find the goal vertex or the stack is empty:
Pop the next vertex v from the stack
If v has not been visited
Mark v as visited (and maybe do some processing)
for all vertices v' that are adjacent to v
If we haven't already visited v':
push v' onto the stack
We can keep track which vertices we have visited with a list (or a
Set
, more abstractly). Or we can decorate the vertex to
show that it has been visited, as the book's code does. The book's
code also uses recursion instead of the stack. It also adds code
to visit edges and process them.
The book also notes that edges can be broken into two groups (for an undirected graph): discovery edges and back edges. They even add another decoration to keep track of edge type. Discovery edges are the first edge to find a vertex. They form a tree, the DFS tree. Back edges go back up the tree to an ancestor, and indicate that a cycle exists in the graph. The book's generic DFS algorithm is a template that allows you to supply methods to be done when a vertex is first visited, when traversing discovery or back edges, when leaving a vertex after visiting it, etc.
For directed graphs it becomes a bit more complicated - there are also forward edges that connect a node to a descendent in the DFS tree and cross edges that connect two nodes, neither of which is the ancestor of the other. The book presents a lot of problems that can be solved by DFS - reachability, strongly connected components, transitive closure, and topological sorting. They don't mention the culmination of "everything that can be done with DFS." That is graph planarity. Bob Tarjan developed a Θ(n + m) time algorithm for determining if a graph is planar using DFS.
This can be a good technique if we just want to find the goal, and don't want to keep too many alternatives around. But for the Kevin Bacon game, we want to find the shortest path from each actor to Kevin Bacon. In that case, instead of searching in a depth-first manner, we search in a breadth-first manner.
Breadth-first search expands uniformly in all "directions", like radiating ripples; the directions here are edges out of a vertex. So it finds all the vertices 1 step away, then all those 2 steps away, etc. Thus we know once we've found the goal, we've got the shortest set of edges from the start vertex to it.
This requires only a small change in our search strategy: rather than using a stack, we use a queue. With a just a few word changes to our algorithm, using a queue instead of a stack, we go from DFS to BFS:
Enqueue the start vertex v onto the queue
Repeat until we find the goal vertex or the queue is empty:
Dequeue the next vertex v from the queue
If v has not been visited
Mark v as visited (and maybe do some processing)
for all vertices v' that are adjacent to v
If we haven't already visited v':
enqueue v' onto the queue
While the above code works, we can optimize it a bit by noting that the first item pushed onto the queue will be the first item off of the queue. Therefore when we first discover a vertex we can mark it when we add it to the queue, knowing that later discoveries of the vertex need not be added to the queue. (This does not work for DFS, because it the most recently added copy of the vertex is the one that comes off of the stack first and thus gets marked and its adjacent vertices pushed onto the stack.) The optimized version is:
Enqueue the start vertex v onto the queue and mark it
Repeat until we find the goal vertex or the queue is empty:
Dequeue the next vertex v from the queue
(Maybe do some processing)
for all vertices v' that are adjacent to v
If we haven't already visited v':
mark v' (and maybe do some processing)
enqueue v' onto the queue
The edge types in BFS are discovery edges or cross edges. No edge goes back to an ancestor, because it would have been visited in the forward direction earlier.
Instead of just keeping track of which vertices we've visited, it's helpful to keep track of how we arrived at each vertex. So instead of a visited list, we build a visited tree, where the parent of a vertex is the vertex from which we discovered it. Note that the parent is unique (since we don't visit a vertex twice), and thus gives a tree. However, we will often use a graph ADT to hold this tree. It gives us a way to do edge labels and to find particular vertices quickly.
We can do a BFS from a start vertex without any goal in mind — just run until we've visited the whole graph (or at least the part of it that is reachable from the start vertex). Then we have a BFS tree that has shortest paths from each reachable vertex to the start vertex. To find such a shortest path, we just find the desired vertex (e.g., an actor) in the tree, and follow edges back to the start vertex (e.g., Kevin Bacon); we can keep track of the edges (movie titles) on the way back. Demo.