Java Programming/Collection Classes

From Wikibooks, the open-content textbooks collection

Jump to: navigation, search
Navigate Collection topic: ( v d e )
Development stage: 75% (as of Jul 01, 2006) Collection or Map
Development stage: 75% (as of Jul 01, 2006) Collection
Development stage: 75% (as of Jul 01, 2006) Map
Development stage: 75% (as of Jul 01, 2006) Set or List or Queue
Development stage: 75% (as of Jul 01, 2006) Set
Development stage: 75% (as of Jul 01, 2006) List
Development stage: 75% (as of Jul 01, 2006) Queue
Development stage: 75% (as of Jul 23, 2006) Map Classes
Development stage: 75% (as of Jul 23, 2006) Thread Safe Collections
Development stage: 75% (as of Jul 23, 2006) Classes Diagram (UML)

Collections are a group of objects bound together by a common characteristic. Java has various built-in classes to support the collection of objects, either of the same type or general. The most basic construct to work with is the array of which you will come to know of in a section later in this chapter.

Contents

[edit] Introduction to Collections

The most basic collection interface is called Collection. This interface gives the user a generic usage of a collection.

import   java.util.Collection;   // Interface
import java.util.ArrayList;    // Implementation
import java.util.HashSet;      // Implementation
...
Collection coll1 = new ArrayList();
Collection coll2 = new HashSet();
...
< Use coll1 & coll2 >

In the above there are two collections. The usage of the collections are the same, the implementations are different. If the existing collection implementations do not meet your needs, you can write your version of the implementation. Your version of the implementation just needs to implement the same java.util.Collection interface, then you can switch to using your implementation and the code that is using the collection does not need to be changed.

import java.util.Collection;
import com.yourcomp.util.YourCollectionImpl;
...
Collection coll1 = new YourCollectionImpl();
Collection coll2 = new YourCollectionImpl();
...
< Use coll1 & coll2 >

The Java JDK collection implementations are quite powerful and good, so it is unlikely that you will need to write your own.

All collections contain object references. Because of that, if the object is changed after it was put in the collection, the object that is 'in' the collection also 'changes'. The object is not really in the collection, only the object reference is. It is not guaranteed that the objects 'inside' the collections won't change. This is an issue only if you put an actively used object in the collection. In that case when you are adding an object that could change any time you need to make a copy or clone of the object. A new object will be created and its reference will be put in the collection. In that case there will be no object references outside of the collection, so the objects 'inside' the collection can only be changed if we take out an object reference from the collection.

Aside from the java.util.Collection interface, the Java JDK has the java.util.Map interface as well. This defines key value mappings. Implementations of the Map interface do not contain collections of objects. Instead they contain collections of key->value mappings.

import java.util.Map;
import java.util.Hashtable;
...
Map map = new Hashtable();
...
map.put( key, value );

All collections need to have the same basic operations. Those are:

  • Adding element(s) to the collection
  • Removing element(s) from the collection
  • Obtaining the number of elements in the collection
  • Listing the contents of the collection, (Iterating through the collection)

Before selecting a particular collection implementation, ask the following question:

  • Can my collection contain the same elements, i.e. are duplicates allowed?
  • Can my collection contain the null element?
  • Should the collection maintain the order of the elements? Is the order important in any way?
  • How do you want to access an element? By index, key or just with an iterator?
  • Does the collection need to be synchronized?
  • From a performance perspective, which one needs to be faster, updates or reads?
  • From a usage perspective, which operation will be more frequent, updates or reads?

Once you know your needs, you can select an exsisting implementation. But first decide if you need a 'Collection', or a 'Map'.

Arrays

[edit] Generics

Since JDK version 1.5 an enhancement to the type system of the Java language has been added. It is called Generics. Most often Generics will be used with the collection classes.

parameterized type <E> 

All collection implementations since 1.5 now have one 'parameterized type <E>' added. The 'E' refers to an Element type. When a collection is created the actual 'Element type' will replace the E.

Objects put into a collection are upcasted to Object class. It means that you need to cast the object reference back when you get an element out from the collection. It also means that you need to know the type of the object when you taking it out. For this reason we usually add objects of the same type to a collection. If a collection contains different types of objects, we will have difficulty finding out the type of the objects obtained from a collection at run time.

With the use of the 'parameterized type <E>', the 'Element-type' that can be put into the collection can be specified when the collection object is created.

Collection<Integer> ageList = new ArrayList<Integer>();
ageList.add( new Integer(46) );     // --- Integer can be added
ageList.add( "50" );  // --- Compilation error, ageList can have only Integers inside

ageList is a collection that can contain only Integer objects as elements. No casting is required when we take out an element.

Integer age = ageList.get(0);

For more information about Generics, please go to Java Programming/Generics.

[edit] Collection or Map

The Java JDK contains two high level Interfaces:

  • java.util.Collection
  • java.util.Map

Implementations for those interfaces are not interchangeable. Collections are used for collecting Java objects. Maps are used for mapping key/value pairs.

[edit] Collection

Use the Collection interface if you need to keep related (usually the same type of) objects together in a collection where you can:

  • Search for a particular element
  • List the elements
  • Maintain and/or change the order of the elements by using the collection basic operations (Add, Remove, Update,..)
  • Access the elements by an index number

The advantages of using the Collection interface are:

  • Gives a generic usage, as we talked about above, it is easy to switch implementation
  • It makes it easy to convert one type of collection to an other.

The Collection interface defines the following basic operations:

The methods above return true if the collection has changed due to the operation. Note that in addAll() we can add any type of collection. This is the beauty of using the Collection interface. You can have a LinkedList and just call the addAll(list) method, passing in a list. You can pass in a Vector, an ArrayList, a HashSet, a TreeSet, a YourImpOfCollection, ... All those different type of collections will be magically converted to a LinkList.

Lets have a closer look at this magic. The conversion is easy because the Collection interface defines a standard way of looping through the elements. The following code is a possible implementation of addAll() method of the LinkList.

import java.util.Collection
import java.util.Iterator
...
public String addAll( Collection coll )
{
   int sizeBefore = _linkList.size();
   Iterator iter = coll.iterator();
   while( iter.hasNext() )
   {
      _linkList.add( iter.next() );
   }
   if ( sizeBefore > _linkList.size();
   {
      return true;
   }
   else
   {
      return false;
   }    
}

The above code just iterates through the passed in collection and adds the elements to the link list. You do not have to do that, since that is already defined. What you might need to code for is to loop through a 'Customer' collection:

import java.util.Collection
import java.util.Iterator
import java.yourcompany.Customer
...
public String printCustomerNames( Collection customerColl )
{
   StringBuffer buf = new StringBuffer();

   Iterator iter = customerColl.iterator();
   while( iter.hasNext() )
   {
      Customer cust = (Customer) iter.next();
      buf.append( cust.getName() );
      buf.append( "\n" );
   }
  return buf.toString();
}

Notice two things:

  • The above code will work for all type of collections.
  • We have to know the type of objects inside the collection, because we call a method on it.

[edit] Map

Figure 1:Map Interfaces

A map, sometimes also called an Associated Array or a Dictionary, can be thought of as an array where the index need not be an integer.

Use the Map interface if you need to keep related objects together in a Map where you can:

  • Access an element by a key object
  • Map one object to other
java.util.Map<K,V> 
maps keys to values. A map cannot contain duplicate keys; each key can map to at most one value. The Map interface provides three collection views, which allow a map's contents to be viewed as a set of keys, collection of values, or set of key-value mappings. The key is usually a non-mutable object. The value object however can be a mutable object.
java.util.SortedMap<K,V> 
same as the Map interface, plus the keys in the Map are sorted.

[edit] Set or List or Queue

Figure 2:

There is no direct implementation for the java.util.Collection interface. The Collection interface has three sub interfaces. Those are:

java.util.Set<E> 
contains unique elements, so duplicates not allowed. It is similar to a mathematical Set.
java.util.List<E> 
elements are put in the list in a certain order, and can be accessed by an index. Duplicates are allowed, the same element can be added to a list.
java.util.SortedSet<E> 
same as the Set interface; plus the elements in the SortedSet are sorted
java.util.Queue<E> 
queues provide additional insertion, extraction, and inspection operations. There are FIFO (first in, first out) and LIFO (last in, first out) queues.
java.util.BlockingQueue<E> 
waits for the queue to become non-empty when retrieving an element, and waits for space to become available in the queue when storing an element. Best used for producer-consumer queues.

[edit] Set

The basic implementation of the Set interface is the HashSet.

Java collection set implementations.jpg
java.util.TreeSet<E>
Elements are sorted, not synchronized. null not allowed
java.util.HashSet<E> 
Not synchronized. Allows the null elements
java.util.CopyOnWriteArraySet<E> 
Thread safe, a fresh copy is created during modification operation. Add, update, delete are expensive.
java.util.EnumSet<E extends Enum<E>> 
All of the elements in an enum set must come from a single enum type that is specified, explicitly or implicitly, when the set is created. Enum sets are represented internally as bit vectors.
java.util.LinkedHashSet<E> 
Same as HashSet, plus defines the iteration ordering, which is the order in which elements were inserted into the set.

Set cannot have duplicates in it. You may wonder how duplicates are detected when we are adding an object to the Set. We have to see if that object exsits in the Set or not. It is not enough to check the object references, the objects' values have to be checked as well.

To do that, fortunately, each java object has the boolean equal(Object obj);, method available inherited from Object. You need to override it. That method will be called by the Set implementation to compare the two objects to see if they are equal or not.

There is a problem, though. What if I put two different type of objects to the Set. I put an Apple and an Orange. They can not be compared. Calling the equal() method would cause a ClassCastException. There are two solutions to this:

  • Solution one : Override the int hashCode() method and return the same values for the same type of objects and return different values for different type of objects. The equal() method is used to compare objects only with the same value of hashCode. So before an object is added, the Set implementation needs to:
    • find all the objects in the Set that has the same hashCode as the candidate object hashCode
    • and for those, call the equal() methods passing in the candidate object
    • if any of them returns true, the object is not added to the Set.
  • Solution two : Create a super class for the Apple and Orange, let's call it Fruit class. Put Fruits in the Set. You need to do the following:
    • Do not override the equals() and hashCode() methods in the Apple and Orange classes
    • Create appleEquals() method in the Apple class, and create orangeEquals() method in the Orange class
    • Override the hashCode() method in the Fruit class and return the same value, so the equals() is called by the Set implementation
    • Override the equals() method in the Fruit class for something like this.
public boolean equals( Object obj )
{
    boolean ret = false; 
    if ( this instanceof Apple &&
          obj instanceof Apple )
    { 
        ret = this.appleEquals(obj);
    }
    else if ( this instanceof Orange &&
              obj instanceof Orange )
    {
        ret = this.orangeEquals(obj);  
    }
    else
    {
        // --- Can not compare Orange to Apple ---
        ret = false;
    }
  return ret;
}

Note:

  • only the objects that have the same hashCode will be compared.
  • you are responsible to override the equal() and hashCode() methods. The default implementations in Object won't work.
  • Only override the hashCode() method if you want to eliminate value duplicates.
  • Do not override the hashCode() method if you know that the values of your objects are different, or if you only want to prevent adding the exactly same object.
  • Beware that the hashCode() may be used in other collection implementaions, like in a Hashtable to find an object fast. Overriding the default hashCode() method may affect performance there.
  • the default hashCodes are unique for each object created, so if you decide not to override the hashCode() method, there is no point overriding the equal() method, as it won't be called.


[edit] SortedSet

The SortedSet interface extends the Set Interface. All elements in the SortedSet must implement the Comparable Interface, futher more all elements must be mutually comparable.

Note that the ordering maintained by a sorted set must be consistent with equals if the sorted set is to correctly implement the Set interface. This is so because the Set interface is defined in terms of the equals operation, but a sorted set performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the sorted set, equal.

The SortedSet interface has additional methods due to the sorted nature of the 'Set'. Those are:

  • E first(); -- returns the first element
  • E last(); -- returns the last element
  • SortedSet headSet(E toElement); -- returns from the first, to the exclusive toElement
  • SortedSet tailSet(E fromElement); -- returns from the inclusive fromElement to the end
  • SortedSet subSet(E fromElement, E toElement); -- returns elements range from fromElement, inclusive, to toElement, exclusive. (If fromElement and toElement are equal, the returned sorted set is empty.)

[edit] List

The List has the following implemenations:

Java collection list implementations.jpg
java.util.Vector<E> 
Syncronized, use in multiple thread access, otherwise use ArrayList
java.util.Stack<E> 
It extends class Vector with five operations that allow a vector to be treated as a stack. It represents a last-in-first-out (LIFO) stack of objects.
java.util.ArrayList<E> 
Non-syncronized, use in single thread environment, otherwise use Vector
java.util.LinkedList<E> 
Non-syncronized, update operation is faster than other lists, easy to use for stacks, queues, double-ended queues.
javax.management.AtributeList<E> 
Represents a list of values for attributes of an MBean. The methods used for the insertion of Attribute objects in the AttributeList overrides the corresponding methods in the superclass ArrayList. This is needed in order to insure that the objects contained in the AttributeList are only Attribute objects.
javax.management.relation.RoleList<E> 
A RoleList represents a list of roles (Role objects). It is used as parameter when creating a relation, and when trying to set several roles in a relation (via 'setRoles()' method). It is returned as part of a RoleResult, to provide roles successfully retrieved.
javax.management.relation.RoleUnresolvedList<E> 
A RoleUnresolvedList represents a list of RoleUnresolved objects, representing roles not retrieved from a relation due to a problem encountered when trying to access (read or write to roles).


The basic implemenation of the List interface is the ArrayList. The ArrayList is not syncronized, not thread safe. Vector is syncronized, and thread safe. Vector is slower, because of the extra overhead to make it thread safe. When only one thread is accessing the list, use the ArrayList. Whenever you insert or remove an element from the list, there are extra overhead to reindex the list. When you have a large list, and you have lots of insert and remove, consider using the LinkedList.

The name LinkList implies a special data structure where the elements/nodes are connected by pointers. To remove an element from the link list the pointers need to be rearranged.

 Head               Node 1                   Node 2                     Node n
  ______
 | Size |          _________________        _______________            _____________
 |______|         |      | point   |       |      | point  |          |      |      |  
 | First|-------->| Data | to next |------>| Data | to next|-- ... -->| Data | null |
 | elem |         |______|_________|       |______|________|          |______|______|
 |______|                                                                 |
 | Last |----------------------------------------------------------------- 
 |_elem_|
  

After removing Node 2:

 Head               Node 1                   Node 2                     Node n
  ______                                 ___________________...________
 | Size |          _________________    |   _______________           _|____________
 |_- 1__|         |      | point   |    |  |      | point  |          |      |      |  
 | First|-------->| Data | to next |----   | Data | to next|          | Data | null |
 | elem |         |______|_________|       |______|________|          |______|______|
 |______|                                                                 |
 | Last |----------------------------------------------------------------- 
 |_elem_|

[edit] Queue

The Queue interface adds the following operations to the Collection interface:

E element() Retrieves, but does not remove, the head of this queue. This method differs from the peek method only in that it throws an exception if this queue is empty
boolean offer(E o) Inserts the specified element into this queue, if possible.
E peek() Retrieves, but does not remove, the head of this queue, returning null if this queue is empty
E poll() Retrieves and removes the head of this queue, or null if this queue is empty
E remove() Retrieves and removes the head of this queue. This method differs from the poll method in that it throws an exception if this queue is empty.


java.util.PriorityQueue<E> 
orders elements according to an order/priority specified at construction time, null element is not allowed.
java.util.concorrent.ArrayBlockingQueue<E> 
orders elements FIFO; syncronized, thread safe.
java.util.concorrent.SyncronousQueue<E> 
each put must wait for a take, and vice versa, does not have any internal capacity, not even a capacity of one, an element is only present when you try to take it; you cannot add an element (using any method) unless another thread is trying to remove it
Java collection queue implementations.jpg

[edit] Map Classes

The Map interface has the following implementations:

Java map implementation.jpg
java.util.TreeMap<E>
guarantees that the map will be in ascending key order, sorted according to the natural order for the key's class, not-syncronized.
java.util.HashMap<E> 
is roughly equivalent to Hashtable, except that it is unsynchronized and permits nulls
java.util.concurrent.ConcurrentHashMap 
same Hashtable, plus retrieval operations (including get) generally do not block, so may overlap with update operations (including put and remove).
java.util.Hashtable<E> 
Syncronized, null can not be used as key
java.util.WeakHashMap<E> 
entry in a WeakHashMap will automatically be removed when its key is no longer in ordinary use. Nonsyncronized.
java.util.LinkHashMap<E> 
This linked list defines the iteration ordering, which is normally the order in which keys were inserted into the map (insertion-order). Note that insertion order is not affected if a key is re-inserted into the map.
java.util.IdentityHashMap 
This class implements the Map interface with a hash table, using reference-equality in place of object-equality when comparing keys (and values). In other words, in an IdentityHashMap, two keys k1 and k2 are considered equal if and only if (k1==k2). (In normal Map implementations (like HashMap) two keys k1 and k2 are considered equal if and only if (k1==null ? k2==null : k1.equals(k2)).) Not-syncronized.
java.util.EnumMap 
All of the keys in an enum map must come from a single enum type that is specified, explicitly or implicitly, when the map is created. Enum maps are represented internally as arrays. This representation is extremely compact and efficient. Not-syncronized.

[edit] Thread Safe Collections

It is also called Concurrent Collections. Most of the popular collection classes have implementations for both single thread and multiple thread environments. The non-syncronized implementations are always faster. You can use the non-syncronized implementations in multiple thread environments, when you make sure that only one thread updating the collection an any given time.

A new Java JDK package was introduced at Java 1.5, that is java.util.concurrent. This package supplies a few Collection implementations designed for use in multithreaded environments.

The following table list all the syncronized collection classes:

syncronized non-syncronized
List java.util.Vector java.util.ArrayList
java.util.Stack
java.util.LinkList
java.util.concurrent.CopyOnWriteArrayList
Set java.util.TreeSet
java.util.HashSet
java.util.LinkHashSet
java.util.concurrent.CopyOnWriteArraySet
Map java.util.TreeMap
java.util.Hashtable

java.util.concurrent.ConcurrentHashMap

java.util.HashMap
java.util.LinkHashMap
java.util.IdentityHashMap
java.util.EnumMap

[edit] Classes Diagram (UML)

The following UML class diagram shows the Collection interfaces and their implementations.

Java collection implementation.jpg



The following UML class diagram shows the Map interfaces and their implementations.

Java map implementation.jpg