Oct 21: Maps, Sets, and Instant Runoff Elections


Code discussed in lecture

Collections

Java has a group of interfaces for holding collections of objects and classes that implement them. We have seen and used List, which is an interface with two Java-provided implementations: ArrayList and LinkedList. Today will look at two other interfaces for holding collections of objects: Set and Map. Each has two Java-provided implementations. Set is implemented by HashSet and TreeSet. Map is implemented by HashMap and TreeMap. We will be looking at their underlying data structures, hash tables and binary search trees, in the next few lectures.

The List Interface

As a review, a number of important methods in the List<E> interface are: If both ArrayList and LinkedList implement this set of operations, why have both? Efficiencies differ. Access operations (set and get) are constant time in an ArrayList, but require time proportional to the distance to the nearest end for a LinkedList. (The LinkedList is a doubly-linked circular linked list, and is smart enough to start at the closest end.) On the other hand, modification operations (add and remove at a given index) require time proportional to the number of items after the index in an ArrayList, because all of these items must be copied. But for a LinkedList they are constant after the time to access the index (distance from nearest end). This means that all Iterator or ListIterator operations are constant time for a LinkedList, but add and remove operations take time proportional to the number of remaining items for an ArrayList.

The Set Interface

A Set differs from a List in that a List has a linear order, while a set does not. Furthermore, the an item can appear multiple times in a List but only once in a Set.

The primary operations on a Set<E> are:

All of these methods are also part of the List interface. So why have a separate interface?

The main reason is implementation efficiency. The contains operation on either a ArrayList or a LinkedList takes O(n) time, and for an ArrayList the remove operation can take O(n) time. For applications like a dictionary for a spell checker this is too slow.

There are two implementations of Set in the Java class library. Both are more efficient for the contains operation than the List implementations.

The first is TreeSet, which uses a data structure called a balanced binary tree to store the data. Think of it as a linked list on which you can do binary search. We will talk about this data structure soon. The important point is that the add, remove, and contains operations are all O(log n) time. It only works on Comparable objects. The iterator is guaranteed to return the items in increasing order by compareTo, and takes O(n) time to iterate through the entire set. Getting the first item from the iterator takes O(log n) time.

The second is HashSet, which uses a data structure called a hash table. We will talk about these next class.

If the hash table is used properly the add, remove, and contains operations are all O(1) time on average (although it is possible that they could take O(n) time if you were extremely unlucky). The iterator returns the items in a somewhat arbitrary order.

As an example of the use of sets, consider the program SetDemo.java. It creates a set consisting of all of the keywords in Java. It then uses an iterator to go through the set and print each of the words. (Note that an iterator on a Set is identical to an iterator on a List.) Finally, it lets the user type words and determines if they are keywords by using contains to see if they are in the set.

The Map Interface

The Map interface describes a data structure that can be thought of as a set where each element has associated data. Each data item is associated with a key. By looking up the key one can get the associated data. A key is typically something like your student ID number, and the associated data might be your student record. Maps are implemented using balance binary trees and hash tables, just like sets.

The primary operations in a Map<K,V> are:

As an example of the use of a map, consider PetSounds.java. This program allows the user to insert animal names as keys and the sounds that they make as the associated data. One can then ask for the sound that a given animal makes, or remove an animal from the map.

Note the way the the print operation works. The code for this is:

  if (animalMap.isEmpty())
    System.out.println("The map is empty");
  else
  {
    System.out.println("The animals and their sounds are:");
	          
    Set<String> animalNames = animalMap.keySet();
    Iterator<String> iter = animalNames.iterator();
	            
    while(iter.hasNext())
    {
      animal = iter.next();
      System.out.println(animal + " says " + animalMap.get(animal));
    }
  }
Note that the first step is to call keySet to get all of the keys in the map. Then an iterator is created for the set, and used to interate through the set, printing each key and the value returned by get for that key.

A More Complex Example - Voting

The method of voting where the candidate with the most votes wins the election has some drawbacks. If two conservatives get in a race against a liberal in a conservative district they may split the conservative vote and the liberal gets elected, even though he is the third choice of the majority of the voters in the election. Also, third parties have a hard time getting established, because voting for a third-party candidate can be "throwing away your vote." If about a third of the 22,000 New Hampshire voters who voted for Nader in the Bush-Gore election had voted for Gore instead he would have won the state and the presidency. Florida would not have mattered.

Some states solve these problems by having a runoff election between the top two candidates if nobody gets a majority of the votes. Unfortunately a runoff election costs time and money. A popular alternative suggestion is the instant runoff election.

In an instant runoff election the voters fill out a ballot with an ordered list of candidates, from most favorable to least favorable. The election takes place in rounds. In the first round each ballot awards a vote to the first candidate on the ballot. If nobody has a majority the candidate with the fewest votes is dropped from the election. (In case of ties we will chose one at random.) Then another round is run. This time the ballot's vote is awarded to the first candidate in its list who has not been eliminated. The bottom candidate is dropped, and the process repeats until someone has a majority. (In fact, it can repeat until there is just one candidate left and get the same result. Once someone has a majority they will never be eliminated.)

How could we write a program to determine the winner of an instant runoff election? The first step is to determine what objects appear in the problem and how they interact with one another. One obvious choice is a ballot. We could say, "Oh, that is just a list" and not create an object for it. But let's take an OO approach and say that there should be a Ballot class.

Another object would be the set of all the ballots in the election. We could just say, "Make a set of lists", but let's make Election a class, also.

A final object that may be less obvious is an object that represents the results of the voting. Let's create a VoteTally class. The alternative is to use a map from candidate names to the number of votes that they received.

We could have a class to keep track of the current set of candidates, but the Set class seems to do everything that we are likely to need. Unless we discover an action that we need to do that the Set class doesn't handle we will just use a Set.

What actions do we need to perform? We first need to get our set of candidates. Note that we can limit this set to candidates who get at least one first-round vote. Others will have zero votes and will be dropped before any of the candidates who got first-round votes. It sounds like the Election is the class that has access to the data to perform this, with help from the Ballot to get the first item on each ballot.

Next we have to run a round of the election. This requires going through all of the ballots, determining who the vote should go to, and increasing that candidate's tally by 1. The Ballot class has the data to determine who should get the vote. The Election class has the ballots. The VoteTally class should update itself by adding a vote for the candidate.

After running a round we have to find the candidate with the fewest votes. The VoteTally class has the information to do this. But what if there is a tie? Well, maybe we find a list of candidates who share the lowest vote total. We pick one at random and eliminate that candidate from the current candidate set.

We have to repeat running a round of the election and eliminating the candidate with fewest votes until we have only one candidate left. This does not seem to be appropriate for any class. A method in the InstantRunoff class can do this.

So what sorts of things do we want to be able to do with a Ballot object?

There are many other possible things we could do with a ballot. Get all of the candidates in order is one possibility. So we could supply an iterator. A toString could be useful. A way of getting the number of candidates on the ballot could be useful. But for now we will do the minimum. We can always come back later to add new methods.

What should we do with an Election object? We need to create it plus perform the jobs mentioned above.

What about the VoteTally object?

The code in Ballot.java, Election.java, and VoteTally.java do these operations. The class InstantRunoffOO.java supplies the method to loop through the rounds and the main method for testing. Look at code briefly.

An alternate approach is InstantRunoffProc.java. This does the same thing, but through fixed data structures and static methods. It is less code, which is a plus. There are longer lists of parameters as all of the data must be passed around "bare". We see data declarations like: List<List<String>> ballots. These are not easy to read and take getting used to. In short, there is no data encapsulation, which is a minus.

On a program this short encapsulation and data hiding aren't that important. On the other hand, I originally had a Set of ballots instead of a List. The Set of Ballot objects in InstantRunoffOO.java wasn't a problem, because Ballot did not override equals. But in InstantRunoffProc.java it was a problem, because two ballots with choices, "Romney Huntsman" were entered into an election but Romney only got one vote. The two ArrayList objects ended up being equal, so only one was kept in the set. Changing from Set<List<String>> to List<List<String>> required five changes spread out over four methods. Finding all of the appropriate changes in a much bigger program (and avoiding changes where the Set<List<String>> wasn't dealing with ballots and may have been correct as it was) would have been tedious and error-prone. In contrast, making the change in InstantRunoffOO.java required two changes in Ballot.java: the declaration of the instance variable and its initialization in the constructor. I would have only needed to make those two changes even if the program had millions of lines.

Scanner class

Note the use of the Scanner class in the programs examined today. You can open it on an input stream usually System.in) or even on a String! Then you can read any type of data. next reads the next "token" as a String. (Tokens are like words, things separated by white space. But you can also change the separator. The class is very flexible!) You can also call nextLine, nextInt, nextLong, nextDouble, nextFloat, nextBoolean, nextBigInteger, nextBigDecimal, nextShort, and nextByte. It will read characters from the input and covert them to the corresponding type. There is also a "has" version of each of these that returns true if the next thing in the input can be converted to the corresponding type (hasNext, hasNextLine, hasNextInt, etc.).