The Way of the Java/Linked lists
References in objects
reference embedded reference reference!embedded node cargo
In the last chapter we saw that the instance variables of an object can be arrays, and I mentioned that they can be objects, too.
One of the more interesting possibilities is that an object can contain a reference to another object of the same type. There is a common data structure, the list, that takes advantage of this feature.
Lists are made up of nodes, where each node contains a reference to the next node in the list. In addition, each node usually contains a unit of data called the cargo. In our first example, the cargo will be a single integer, but later we will write a generic list that can contain objects of any type.
generic data structure data structure!generic
The Node class
Node class class!Node
As usual when we write a new class, we'll start with the instance variables, one or two constructors and toString so that we can test the basic mechanism of creating and displaying the new type.
verbatim public class Node
int cargo; Node next;
public Node () cargo = 0; next = null;
public Node (int cargo, Node next) this.cargo = cargo; this.next = next;
public String toString () return cargo + "";
The declarations of the instance variables follow naturally from the specification, and the rest follows mechanically from the instance variables. The expression cargo + "" is an awkward but concise way to convert an integer to a String.
To test the implementation so far, we would put something like this in main:
Node node = new Node (1, null); System.out.println (node);
The result is simply
verbatim 1 verbatim
To make it interesting, we need a list with more than one node!
Node node1 = new Node (1, null); Node node2 = new Node (2, null); Node node3 = new Node (3, null);
This code creates three nodes, but we don't have a list yet because the nodes are not linked. The state diagram looks like this:
To link up the nodes, we have to make the first node refer to the second and the second node refer to the third.
node1.next = node2; node2.next = node3; node3.next = null;
The reference of the third node is null, which indicates that it is the end of the list. Now the state diagram looks like:
Now we know how to create nodes and link them into lists. What might be less clear at this point is why.
Lists as collections
The thing that makes lists useful is that they are a way of assembling multiple objects into a single entity, sometimes called a collection. In the example, the first node of the list serves as a reference to the entire list.
list!printing list!as parameter
If we want to pass the list as a parameter, all we have to pass is a reference to the first node. For example, the method printList takes a single node as an argument. Starting with the head of the list, it prints each node until it gets to the end (indicated by the null reference).
public static void printList (Node list) Node node = list;
while (node != null) System.out.print (node); node = node.next; System.out.println ();
To invoke this method we just have to pass a reference to the first node:
Inside printList we have a reference to the first node of the list, but there is no variable that refers to the other nodes. We have to use the next value from each node to get to the next node.
This diagram shows the value of list and the values that node takes on:
This way of moving through a list is called a traversal, just like the similar pattern of moving through the elements of an array. It is common to use a loop variable like node to refer to each of the nodes in the list in succession.
loop variable list!traversal traverse
The output of this method is
verbatim 123 verbatim
By convention, lists are printed in parentheses with commas between the elements, as in (1, 2, 3). As an exercise, modify printList so that it generates output in this format.
As another exercise, rewrite printList using a for loop instead of a while loop.
Lists and recursion
list!traverse recursively traverse
Recursion and lists go together like fava beans and a nice Chianti. For example, here is a recursive algorithm for printing a list backwards:
fava beans Chianti
Separate the list into two pieces: the first node (called the head) and the rest (called the tail).
Print the tail backwards.
Print the head.
Of course, Step 2, the recursive call, assumes that we have a way of printing a list backwards. But if we assume that the recursive call works---the leap of faith---then we can convince ourselves that this algorithm works.
leap of faith list!printing backwards
All we need is a base case, and a way of proving that for any list we will eventually get to the base case. A natural choice for the base case is a list with a single element, but an even better choice is the empty list, represented by null.
public static void printBackward (Node list) if (list == null) return;
Node head = list; Node tail = list.next;
printBackward (tail); System.out.print (head);
The first line handles the base case by doing nothing. The next two lines split the list into head and tail. The last two lines print the list.
We invoke this method exactly as we invoked printList:
The result is a backwards list.
Can we prove that this method will always terminate? In other words, will it always reach the base case? In fact, the answer is no. There are some lists that will make this method crash.
infinite list list!infinite loop!in list list!loop
There is nothing to prevent a node from referring back to an earlier node in the list, including itself. For example, this figure shows a list with two nodes, one of which refers to itself.
If we invoke printList on this list, it will loop forever. If we invoke printBackward it will recurse infinitely. This sort of behavior makes infinite lists difficult to work with.
Nevertheless, they are occasionally useful. For example, we might represent a number as a list of digits and use an infinite list to represent a repeating fraction.
Regardless, it is problematic that we cannot prove that printList and printBackward terminate. The best we can do is the hypothetical statement, ``If the list contains no loops, then these methods will terminate. This sort of claim is called a precondition. It imposes a constraint on one of the parameters and describes the behavior of the method if the constraint is satisfied. We will see more examples soon.
The fundamental ambiguity theorem
ambiguity!fundamental theorem theorem!fundamental ambiguity
There is a part of printBackward that might have raised an eyebrow:
Node head = list; Node tail = list.next;
After the first assignment, head and list have the same type and the same value. So why did I create a new variable?
The reason is that the two variables play different roles. We think of head as a reference to a single node, and we think of list as a reference to the first node of a list. These ``roles are not part of the program; they are in the mind of the programmer.
The second assignment creates a new reference to the second node in the list, but in this case we think of it as a list. So, even though head and tail have the same type, they play different roles.
This ambiguity is useful, but it can make programs with lists difficult to read. I often use variable names like node and list to document how I intend to use a variable, and sometimes I create additional variables to disambiguate.
I could have written printBackward without head and tail, but I think it makes it harder to understand:
public static void printBackward (Node list) if (list == null) return;
printBackward (list.next); System.out.print (list);
Looking at the two function calls, we have to remember that printBackward treats its argument as a list and print treats its argument as a single object.
Always keep in mind the fundamental ambiguity theorem:
quote A variable that refers to a node might treat the node as a single object or as the first in a list of nodes. quote
Object methods for nodes
method!object node!object method
You might have wondered why printList and printBackward are class methods. I have made the claim that anything that can be done with class methods can also be done with object methods; it's just a question of which form is cleaner.
In this case there is a legitimate reason to choose class methods. It is legal to send null as an argument to a class method, but it is not legal to invoke an object method on a null object.
verbatim Node node = null; printList (node); // legal node.printList (); // NullPointerException verbatim
This limitation makes it awkward to write list-manipulating code in a clean, object-oriented style. A little later we will see a way to get around this, though.
list!modifying modifying lists
Obviously one way to modify a list is to change the cargo of one of the nodes, but the more interesting operations are the ones that add, remove, or reorder the nodes.
As an example, we'll write a method that removes the second node in the list and returns a reference to the removed node.
public static Node removeSecond (Node list) Node first = list; Node second = list.next;
// make the first node refer to the third first.next = second.next;
// separate the second node from the rest of the list second.next = null; return second;
Again, I am using temporary variables to make the code more readable. Here is how to use this method.
printList (node1); Node removed = removeSecond (node1); printList (removed); printList (node1);
The output is
verbatim (1, 2, 3) the original list (2) the removed node (1, 3) the modified list verbatim
Here is a state diagram showing the effect of this operation.
What happens if we invoke this method and pass a list with only one element (a singleton)? What happens if we pass the empty list as an argument? Is there a precondition for this method?
Wrappers and helpers
wrapper methods method!wrapper helper method method!helper
For some list operations it is useful to divide the labor into two methods. For example, to print a list backwards in the conventional list format, (3, 2, 1) we can use the printBackwards method to print 3, 2, but we need a separate method to print the parentheses and the first node. We'll call it printBackwardNicely.
public static void printBackwardNicely (Node list) System.out.print ("(");
if (list != null) Node head = list; Node tail = list.next; printBackward (tail); System.out.print (head); System.out.println (")");
Again, it is a good idea to check methods like this to see if they work with special cases like an empty list or a singleton.
Elsewhere in the program, when we use this method, we will invoke printBackwardNicely directly and it will invoke printBackward on our behalf. In that sense, printBackwardNicely acts as a wrapper, and it uses printBackward as a helper.
The LinkedList class
There are a number of subtle problems with the way we have been implementing lists. In a reversal of cause and effect, I will propose an alternative implementation first and then explain what problems it solves.
First, we will create a new class called LinkedList. Its instance variables are an integer that contains the length of the list and a reference to the first node in the list. LinkedList objects serve as handles for manipulating lists of Node objects.
verbatim public class LinkedList
int length; Node head;
public LinkedList () length = 0; head = null;
One nice thing about the LinkedList class is that it gives us a natural place to put wrapper functions like printBackwardNicely, which we can make an object method in the LinkedList class.
public void printBackward () System.out.print ("(");
if (head != null) Node tail = head.next; Node.printBackward (tail); System.out.print (head); System.out.println (")");
Just to make things confusing, I renamed printBackwardNicely. Now there are two methods named printBackward: one in the Node class (the helper) and one in the LinkedList class (the wrapper). In order for the wrapper to invoke the helper, it has to identify the class explicitly (Node.printBackward).
So, one of the benefits of the LinkedList class is that it provides a nice place to put wrapper functions. Another is that it makes it easier to add or remove the first element of a list. For example, addFirst is an object method for LinkedLists; it takes an int as an argument and puts it at the beginning of the list.
public void addFirst (int i) Node node = new Node (i, head); head = node; length++;
As always, to check code like this it is a good idea to think about the special cases. For example, what happens if the list is initially empty?
invariant object invariant list!well-formed
Some lists are ``well-formed; others are not. For example, if a list contains a loop, it will cause many of our methods to crash, so we might want to require that lists contain no loops. Another requirement is that the length value in the LinkedList object should be equal to the actual number of nodes in the list.
Requirements like this are called invariants because, ideally, they should be true of every object all the time. Specifying invariants for objects is a useful programming practice because it makes it easier to prove the correctness of code, check the integrity of data structures, and detect errors.
One thing that is sometimes confusing about invariants is that there are some times when they are violated. For example, in the middle of addFirst, after we have added the node, but before we have incremented length, the invariant is violated. This kind of violation is acceptable; in fact, it is often impossible to modify an object without violating an invariant for at least a little while. Normally the requirement is that every method that violates an invariant must restore the invariant.
If there is any significant stretch of code in which the invariant is violated, it is important for the comments to make that clear, so that no operations are performed that depend on the invariant.
list node cargo link generic data structure precondition invariant wrapper
[list:] A data structure that implements a collection using a sequence of linked nodes.
[node:] An element of a list, usually implemented as an object that contains a reference to another object of the same type.
[cargo:] An item of data contained in a node.
[link:] An object reference embedded in an object.
[generic data structure:] A kind of data structure that can contain data of any type.
[precondition:] An assertion that must be true in order for a method to work correctly.
[invariant:] An assertion that should be true of an object at all times (except maybe while the object is being modified).
[wrapper method:] A method that acts as a middle-man between a caller and a helper method, often offering an interface that is cleaner than the helper method's.