Nov 6: Shortest Paths (Dijkstra's) and A* search
Code discussed in lecture
- Dijkstra.java
- Sequencers
- SequenceItems
- StackSequencer
- QueueSequencer
- DistSequencer
- PQSequenceItem
- AStarSequencer
- AStarSequenceItem
MazeSrc.zip
Shortest Paths - Dijkstra's Algorithm
BFS works well to find paths with the minimum number of edges, as in the Kevin Bacon Game. However, many times we will have graphs where the edges have weights. An example that we discussed is a map with edge weights being the distance between vertices. How can we find the shortest distance between any pair of vertices? There are a number of algorithms for doing this, but we will discuss what is called the "single-source shortest path" problem. If there are no negative edge weights, then this can be solved by a generalization of BFS called Dijkstra's Algorithm, named after its inventor. The book gives a good explanation of this algorithm. You should read it!
How could we have negative edge weights? Well, one possiblility is that the edge weights are the costs to drive a car along the edge. Sometimes people want to fly and are willing to pay to have their car driven to their destination. In this case the cost of driving a car between two vertices can be negative.
If a graph has a cycle whose weight as you go around the cycle is negative then there is no least-cost path. Every time you go around the negative cycle the cost gets less, so you can make the cost as small as you wish. If there are no negative-weight cycles then there are shortest paths, but computing them takes a more complicated algorithm than Dijkstra's. Take CS 31 to learn more about these algorithms.
Dijkstra's algorithms is an example of a greedy algorithm. At each step you take what looks like the best choice, with no look-ahead. Sometimes "greed is good" and you get an optimal algorithm. That happens with Dijkstra's algorithm. Other times you get a good heuristic, but no guarantee of the optimal answer.
So how does Dijkstra's algorithm work? Basically it grows a "cloud" of vertices whose distance from the start vertex is known. At each step it adds one vertex to this cloud. When all vertices are added we know the shortest distance to any vertex. (It is easy to construct a shortest-path tree as with BFS, if you want to know the paths.)
The idea is to keep a priority queue of all of the vertices with the key for each vertex being the best distance to the start vertex found so far. This queue starts with the start vertex having key 0 and every other vertex having key INFINITY. At each step, we:
- Remove the minimum entry from the queue and add its vertex v to the cloud. The correct minimum distance is the key value when the entry was removed from the priority queue.
- Check all vertices v' adjacent to v to see if we have found a shorter way to get there. The book talks about a "relaxation" step. All you are doing in this step is checking if the weight of the shortest path to v plus the weight of the v v' edge (which gives a valid path from the start to v') is better than the currently best known weight. If it is, update the entry in the priority queue by reducing its key to the new weight. (Note that this requires being able to find the neighbor in the priority queue. The book uses a heap with location-aware positions to achieve this.)
When all vertices have been added to the cloud we are done. Demo on a graph with Hanover, W. Leb, Lebanon, Claremont, Concord, Manchester, and Boston. Have Hanover - Leb be 10, but Hanover - W. Leb and W. Leb - Leb be 4.
For correctness proof see the book. Also, look briefly at the code.
Run time
We add each vertex to the PQ once, remove it as the min once, and relax each edge (and perhaps reduce a key) once. Let n be the number of vertices and m the number of edges. The the run time is O(n*(insert time + deleteMin time) + m*(reduceKey time)). So how long this is depends on the time to insert and deleteMin in a priority queue and on the time to reduce a key. The natural way to do this is to use a heap-based PQ as the book does. Then each operation takes O(log n), for a total of O(n log n + m log n).
An alternate approach (that seems pretty stupid) is to keep an unordered list for the PQ. Then insert takes O(1), deleteMin takes Θ(n), and reduceKey takes O(1) (if you have position-aware vertices so can find where a vertex is in the list in O(1) time). So this takes O(n2 + m) time. This is clearly worse. Except it isn't. If the graph has lots of edges and m = Θ(n2) this is Θ(n2). The "clever" approach is Θ(n2 log n)!
This was an embarrassment for decades. Eventually a type of priority queue called a Fibonacci Heap was invented, which has O(1) amortized time for reduceKey. So the run time is O(n log n + m), which is O(n2).
Look at the code in Dijkstra.java to see how it uses an adaptable priority queue and the Decorator pattern to implement the algorithm described above.
Search on Implicit Graphs and A*
Add Montpelier and Burlington to map above. Point out that if you are looking for a path from Hanover to Boston it makes no sense to calculate distances to Montpelier and Burlington, although that is exactly what you do in Dijkstra's. How do you know this? You are going in the "wrong direction." But how could Dijkstra's algorithm know this?
If we are looking for the shortest path to a known goal we can use a generalization of Dijkstra's algorithm called A* search to find the shortest path without taking as much time as normal Dijkstra's. The idea is to use an estimate the distance remaining, and to add it to the distance traveled so far to get the key for the city in the PQ. For this case the normal Euclidean distance between Montpelier and Boston would be something like 130 miles. (Figure this Euclidean distance from GPS coordinates of longitude and latitude.) Adding this to the 45 miles from Hanover to Monpelier would give 175 miles. Therefore Boston (whose distance will be 120 miles or so) would come out of the PQ before Montpelier.
Just as in Dijkstra's algorithm, in A* we have a PQ and a "cloud." The cloud is traditionally called the "closed set". These are vertices where we know the best path, so if we discover them again while searching we don't need to deal with them. The second group is the "open set" corresponding to vertices in the PQ. Just as in Dijkstra's when we deleteMin from the PQ we move the corresponding vertex to the cloud (closed set) and remove it from the PQ and the open set. We then find incident edges and relax them (update a adjacent vertex's current distance if we have found a shorter path). But the key in the PQ is always distance so far plus the estimate of the remaining distance.For this to find the shortest path we need two things. First, the estimate of distance remaining must be admissible. This means that we always underestimate the true remaining distance or get it exactly. Because of the triangle inequality the Euclidean distance is an underestimate of the true travel distance. In fact, the ultimate underestimate of the remaining distance is 0. This leads to normal Dijkstra, because then we take the shortest distance traveled next.
The second requirement is that the cost estimate is monotone. That means that if we extend the path (thus increasing the distance travelled so far), the total estimate is never less that the estimate before we extended the path. This is the generalization of "no negative edges." Our Euclidean distance plus distance so far estimate satisfies this, because if moving to an adjacent vertex increases the distance so far by d it cannot reduce the Euclidean distance by more than d. (It would reduce by exactly d if you were moving directly toward the goal.) Thus the sum will not go down. Using just Euclidean distance (without distance so far) would fail to be monotone, so would not be not guaranteed to give shortest paths.
We often deal with problems where the graph is implicit. Consider a maze. There is an underlying graph - the squares in the maze are the vertices and the edges connect adjacent squares. However, to convert the maze to a graph is more effort than to find the shortest path, because we end up converting parts of the maze that we would never visit. What is worse is a puzzle like the 15-puzzle (15 numbered tiles and a blank in a 4x4 frame, with the tiles able to slide into the blank's position) or Rubic's Cube. Here the size of the implicit graph is potentially enormous. Therefore we create vertex objects as we need them and find adjacencies on the fly. (For the 15-puzzle two configurations are adjacent if they differ by sliding one tile into the blank position and for Rubic's cube two configurations are adjacent if they differ by a twist.)
Suppose we want to find paths in a maze. We can used DFS, but this can lead to very long paths. (Demo maze5 in the Maze program.) BFS will guarantee finding the shortest path, but this can be very slow. We can improve on DFS by using a heuristic, e.g. the L1 (Manhattan) distance from where we are to the target square. (This means adding the x difference to the y difference between the two squares.) If we use a PQ based on this distance we will tend to "zoom in" on the target. However, this will not guarantee a shortest path. (Again demo the maze program.) Can we do better? Yes. Using A* search we can find the shortest path without taking as much time as BFS. The idea is to add a heuristic like the L1 distance to estimate the distance remaining. For our maze the L1 distance is an admissible estimate - we move either horizontally or vertically in each step, so the number of steps needed is at least the L1 distance. Note we could also have used the Euclidean distance. That would be a bigger underestimate, because we the Euclidean distance will usually be smaller than the L1 distance and will never be larger. Demo on maze5, showing that the square where the paths meet will be discovered from the right first, but will be eventually included in the path from the left.
Note that using a remaining distance estimate of 0 gives BFS on the implicit maze graph, because the shortest path is always expanded next.
Look at how the maze program implements this via Sequencers (generalized priority queues) and SequenceItems (to keep track of squares in the matrix and the path so far). The StackSequencer and QueueSequencer use a stack and queue respectively to add and get next items. The DistSequencer uses a priority queue, where the priority is the L1 distance to the target. This distance gets saved in the key field in the PQSequenceItem and is used when comparing PQSequenceItems in the priority queue.
The AStarSequencer keeps track of the open set via a map from SequenceItems to Entries in the PQ, so that when we find a shorter distance we can find and update the approprite PQ Entry. It also has an AdaptablePriorityQueue to allow the value of an entry's key to be changed when relaxation finds a better path. The AStarSequenceItem has an instance variable to hold the path length and a method get it, because that is part of the estimate. Note that SequenceItems are equal if the row and column positions are equal. Note that we overrode hashCode to make the hashCodes of equal items equal.
The code for the Maze solver is in:
MazeSrc.zip
.