Nov 1: Graphs and their Representations


Code discussed in lecture

Graphs

A graph represents a set of relationships among things. The vertices represent the things and the edges the relationships. There are hundreds of computer applications where a graph is the appropriate ADT to use. These include road maps (vertices are intersections and edges are roads between them), airline route (vertices are the airports and edges are scheduled flights between them), and computer networks (vertices are computers and edges represent network connections between them).

One instantiation of a graph is with vertices for people and edges for their actual relationships — friend, partner, sibling, parent/child, etc. So a social network like Facebook can be represented by a graph, with vertices being people who created Facebook pages and edges connecting people who have "friended" one another. Twitter is also, with people with Twitter accounts being the nodes and there is an edge from a to b if a follows b.

One popular example is a graph of actors, with edges for those who have been in a movie together (edge labels hold movie names). A geekier example is a graph of authors of mathematical papers, with edges for coauthors (edge labels for paper titles).

Given such a graph, there are many types of analyses we could do on it. Who's the most "popular" (has the most edges)? Who are mutual acquaintences ("cliques" where all have edges to each other)? Who is a friend-of-a-friend but is not yet a friend?

And of course there is the Kevin Bacon game: someone has been in a movie with someone who has been in a movie ... who has been in a movie with Kevin Bacon — how many steps away are they (conjecture: usually at most 6)? In the geekier version, the center of the universe is Paul Erdos, a profilic author and coauthor, and people are characterized by their Erdos numbers. The highest known finite Erdos number is 13. Remarkably, there are a number of people who have both small Erdos numbers and small Bacon numbers (number = steps away). Dan Kleitman has total Erdos-Bacon number of 3 (Erdos 1, Bacon 2), but the Bacon number is due to a role as an extra. Danica McKellar has an Erdos-Bacon number of 6, and is both a professional actress (The Wonder Years and West Wing) and wrote a published math paper (as well as supplemental math texts designed for teenage girls: Math Doesn't Suck, Kiss My Math, and Hot X: Algebra Exposed.)

Note that there are two basic styles of graphs - one where the relationship is symmetric and the other where it is asymmetric. Facebook is a symmetric relationship - if a is b's friend, then b is a's friend also. Others are asymmetric. If a follows b on Twitter it does not imply that b follows a. In a graph of people where the relation is "child-of" it is never the case that if a is a child of b that b is a child of a. Unrequited love is another case where the relationship is asymetric. Symmetric relationships lead to undirected graphs. An edge between a and b goes both ways. Asymmetrical relationships lead to directed graphs. An edge from a to b is directed from source (a) to destination (b), and there is no implication that there is also an edge from b to a. There are also mixed graphs, where some edges are undirected and others are directed. A road graph with some one-way streets is an example.

We will first deal with undirected graphs. We will come back to directed graphs next class. The book defines a number of graph terms on pp. 598-602. Please read these!

The book gives a Graph ADT. It has a number of useful methods. (Go through them in class). However, it is missing one thing that is often useful - retrieving a vertex that has a particular element value. The only way to do that here is to interate through the vertices and test them.

Graph Representations

Edge List

What data structures can we use to represent a graph? The simplest is an edge list. It can consist of a list of edge objects. Each edge object has a reference to a value or label as its element. It has references to both incident endpoints. (A vertex is incident to edge if it is an endpoint of that edge.) In the book's representation the edges are saved in a linked list of positions, and the edge knows where it is in that linked list. (This allows easy deletion.) The book also has the vertices saved in a linked list, but there is no easy way to get from a vertex to an edge. Draw figure similar to 13.3.

In general, the operations of insertVertex, insertEdge, and removeEdge will be fast as long as the edges and vertices are kept in linked lists, and replace is always fast. vertices and edges just have to return the list (a list is Iterable), so these are constant time. (Note that the claim on p. 606 that you return an interator is wrong. This is probably why they think that vertices and edges take linear time - they are copying them or something.) Draw a table with methods as rows and implementations as columns, fill in times, as in Table 13.1.

So which of the other operations are easy and hard? If there are n vertices and m edges, how long will the various operations take? The fast operations are endVertices and opposite. But incidentEdges and areAdjacent may require searching the entire edge list, so are O(m) time. At first glance removeVertex would seem to be fast - just remove it from the linked list. But part of the method is removing all edges incident to the vertex removed, so this will also take O(m) time.

Adjacency List

The obvious way to speed up the slow operations from the edge list structure is to have some way to get from a vertex to the edges that it is incident to. Traditionally the incident edges to a vertex are stored in a linked list, leading to the name adjacency list. The book points out that the incident edges could be stored in any collection. Update figure to show how this can be done, similar to Fig. 13.4.

How does this change run times? Clearly incidentEdges is now very fast - just return the collection (or a copy of it). To copy it or actually iterate through the adjacency list for vertex v takes time O(degree(v)). Method removeVertex has the same run time. Method areAdjacent(v,w) requires going though the edges incident to one of the vertices and seeing if the other is opposite to it. This can be done in time O(min(degree(v), degree(w)), if there is a fast way of learning the degree of a vertex.

Adjacency Matrix

The only method that takes more time than is absolutely needed to update the structure is areAdjacent(v,w). To speed this up we can create an adjacency matrix, as long as the graph has no parallel edges. We combine this with the edge list structure. We number the vertices from 0 to n-1. We create an n x n matrix A where entry A[i][j] holds a reference to the edge from vertex i to vertex j if it exists. Otherwise A[i][j] == null. (Note that for an undirected graph A[i][j] == A[j][i].)

This makes areAdjacent(v,w) a Θ(1) operation - look in the matrix to see if there is an edge at the appropriate entry. However, incidentVertices now takes Θ(n) time, because we have to look through an entire row or column. And adding or removing a vertex requires rebuilding the entire matrix, so take Θ(n2) time.

The more usual definition of an adjacency matrix has only the matrix. The vertices are implicitly named 0 to n-1 and the entries are boolean. True means an edge exists and false means that no edge exists. If you want edge values (distances, for instance), then the matrix would have either null or a reference to the value. No edge objects exist. In this case enumerating all of the edges is a Θ(n2) time operation, because you have to look at all matrix entries.