facebook google plus twitter
Webucator's Free Java Tutorial

Lesson: Collections

Welcome to our free Java tutorial. This tutorial is based on Webucator's Introduction to Java Training course.

In this lesson, you will learn about the collections API.

Lesson Goals

  • Learn about the concepts and classes in the Collections API.
  • Learn how to work with Lists, Sets, and Maps.
  • Learn how to use Iterators.
  • Learn how to define and use Comparable and Comparator classes.
  • Learn about generic classes.


The Java Collections API is a set of classes and interfaces designed to store multiple objects.

There are a variety of classes that store objects in different ways:

  • Lists store objects in a specific order.
  • Sets reject duplicates of any objects already in the collection.
  • Maps store objects in association with a key, which is later used to look up and retrieve the object (note that if an item with a duplicate key is put into a map, the new item will replace the old item).

The basic distinctions are defined in several interfaces in the java.util package:

  • Collection is the most basic generally useful interface that most, but not all, collection classes implement.
    • It specifies a number of useful methods, such as those to add and remove elements, ascertain the size of the collection, determine if a specific object is contained in the collection, or return an Iterator that can be used to loop through all elements of the collection (more on this later).
    • Methods include: add(Object o), remove(Object o), contains(Object o), iterator().
    • Note that removing an element removes an object that compares as equal to a specified object, rather than by position.
    • Collection extends the Iterable interface, which only requires one method, which supplies an object used to iterate through the collection.
  • The List interface adds the ability to insert and delete at a specified index within the collection, or retrieve from a specific position.
  • The Set interface expects that implementing classes will modify the add methods to prevent duplicates, and also that any constructors will prevent duplicates (the add method returns a boolean that states whether the operation succeeded or not; i.e., if the object could be added because it was not already there).
    • Sets do not support any indexed retrieval.
  • The Map interface does not extend Collection, because its adding, removing, and retrieving methods use a key value to identify an element.
    • Methods include: get(Object key), put(Object key, Object value), remove(Object key), containsKey(Object key), containsValue(Object value), keySet(), entrySet().
    • Maps do not support any indexed retrieval.

In addition to the List, Set, and Map categories, there are also several inferences you can make from some of the collection class names:

  • A name beginning with Hash uses a hashing and mapping strategy internally, although the key values usually have no meaning (the hashing approach attempts to provide an efficient and approximately equal lookup time for any element).
  • A name beginning with Linked uses a linking strategy to preserve the order of insertion and is optimized for insertion/deletion as opposed to appending or iterating.
  • A name beginning with Tree uses a binary tree to impose an ordering scheme, either the natural order of the elements (as specified by the Comparable interface implemented by many classes including String and the numeric wrapper classes) or an order dictated by a special helper class object (a Comparator).

Several additional interfaces are used to define useful helper classes:

  • Enumeration provides a pair of methods that enable you to loop through a collection in a standardized manner, regardless of the type of collection (you test if there are more elements, and, if so, retrieve the next element).
    • This interface is less frequently used now that the following interface has been added to the API.
    • But, many of the elements in J2EE (servlets in particular) predate the Iterator concept and were specified to use enumerations, so they will still appear in new code.
  • Iterator is an improved version that allows an object to be removed from the collection through the iterator.

The following diagrams show some of the classes and interfaces that are available:

Collections API Class Hierarchy - Lists and Sets

Collections API Class Hierarchy - Maps

Using the Collection Classes

The following are several examples of collection classes:

Vector stores objects in a linear list, in order of addition. Some additional notes on Vector are:

  • You can insert and delete objects at any location.
  • All methods are synchronized, so that the class is thread-safe.
  • Vector predates the collections framework, but was marked to implement List when the collections API was developed
    • As a result, there are many duplicate methods, like add and addElement, since the names chosen for Collection methods didn't all match existing method names in Vector.
  • Vector was used extensively in the past, and therefore you will see it in a lot of code, but most recent code uses ArrayList instead.

ArrayList, like Vector, stores objects in a linear list, in order of addition. Methods are not synchronized, so that the class is not inherently thread-safe (but there are now tools in the Collections API to provide a thread-safe wrapper for any collection, which is why Vector has fallen into disuse).

TreeSet stores objects in a linear sequence, sorted by a comparison, with no duplicates. TreeSet also:

  • Stores Comparable items or uses a separate Comparator object to determine ordering.
  • Uses a balanced tree approach to manage the ordering and retrieval.

Note that there is no tree list collection, because the concepts of insertion order and natural order are incompatible. Since sets reject duplicates, any comparison algorithm should include a guaranteed tiebreaker (for example, to store employees in last name, first name order: to allow for two Joe Smiths, we should include the employee id as the final level of comparison).

TreeMap stores objects in a Map, where any subcollection or iterator obtained will be sorted by the key values. Note that:

  • Keys must be Comparable items or have a separate Comparator object.
  • For maps, an iterator is not directly available. You must get either the key set or entry set, from which an iterator is available.

Hashtable stores objects in a Map. Note also that:

  • All methods are synchronized, so that the class is thread-safe.
  • Hashtable predates the collections framework, but was marked to implement Map when the collections API was developed.
  • Like Vector, Hashtablehas some duplicate methods.
  • Hashtable extends an obsolete class called Dictionary.

HashSet uses hashing strategy to manage a Set. Note also that:

  • HashSet rejects duplicate entries.
  • Entries are managed by an internal HashMap that uses the entry hashCode values as the keys.
  • Methods are not synchronized.

Using the Iterator Interface

Iterators provide a standard way to loop through all items in a collection, regardless of the type of collection. Note that:

  • All collection classes implement an iterator() method that returns the object's iterator (declared as Iterator iterator()).
  • This method is specified by the Iterable interface, which is the base interface for Collection.
  • For maps, the collection itself is not Iterable, but the set of keys and the set of entries are.
  • You can use a for-each loop for any Iterable object.

Without an iterator, you could still use an indexed loop to walk through a list using the size() method to find the upper limit, and the get(int index) method to retrieve an item, but other types of collections may not have the concept of an index, so an iterator is a better choice.

There are two key methods supplied by an iterator:

  1. boolean hasNext(), which returns true until there are no more elements.
  2. Object next(), which retrieves the next element and also moves the iterator's internal pointer to it.

Some collections allow you to remove an element through the iterator. The remove method is marked in the docs as optional by the definition of interfaces it has to be present, but the optional status indicates that it may be implemented to merely throw an UnsupportedOperationException.

In any case, if the collection is modified from a route other than via the iterator (perhaps by using remove(int index), or even just using the add method), a ConcurrentModificationException is thrown.

The Enumeration class is an older approach to this concept; it has methods hasMoreElements() and nextElement().

Code Sample:

import java.util.*;

public class CollectionsTest {
  public static void main(String[] args) {
    List l = new ArrayList();
    Map m = new TreeMap();
    Set s = new TreeSet();
    l.add(new Integer(1));
    l.add(new Integer(4));
    l.add(new Integer(3));
    l.add(new Integer(2));
    l.add(new Integer(3));
    m.put(new Integer(1), "A");
    m.put(new Integer(4), "B");
    m.put(new Integer(3), "C");
    m.put(new Integer(2), "D");
    m.put(new Integer(3), "E");
    System.out.println("Adding to Set");
    System.out.println("Adding 1: " + s.add(new Integer(1)));
    System.out.println("Adding 4: " + s.add(new Integer(4)));
    System.out.println("Adding 3: " + s.add(new Integer(3)));
    System.out.println("Adding 2: " + s.add(new Integer(2)));
    System.out.println("Adding 3: " + s.add(new Integer(3)));
    Iterator i = l.iterator();
    while (i.hasNext()) System.out.println(i.next());
    System.out.println("Map using keys");
    i = m.keySet().iterator();
    while (i.hasNext()) System.out.println(m.get(i.next()));
    System.out.println("Map using entries");
    i = m.entrySet().iterator();
    while (i.hasNext()) System.out.println(i.next());
    i = s.iterator();
    while (i.hasNext()) System.out.println(i.next());

This program demonstrates the three types of collections. As is commonly done, the variables are typed as the most basic interfaces (List, Set, and Map).

We attempt to add the same sequence of values to each: 1, 4, 3, 2, and 3; then:

  • They are deliberately out of sequence and contain a duplicate.
  • For the Map, the numbers are the keys, and single-letter strings are used for the associated values.

In the output, first note the true and false values resulting from attempting to add the values to the Set. Also note in the Map listing, which value associated with the key 3 is retained.

An iterator is then obtained for each and the series printed out (for the Map, we try two different approaches: iterating through the keys and retrieving the associated entries, and iterating directly through the set of entries).

Note the order of the values for each, and also which of the duplicates was kept or not. Note also that:

  • Tthe List object stores all the objects and iterates through them in order of addition.
  • The Map object stores by key; since a TreeMap is used, its iterator returns the elements sorted in order by the keys.
  • Since one key is duplicated, the later of the two values stored under that key is kept.
  • The Set object rejects the duplicate, so the first item entered is kept (although in this case it would be hard to tell from the iterator listing alone which one was actually stored).

Creating Collectible Classes

hashCode and equals

Both the equals(Object) method and hashCode() methods are used by methods in the Collections API. Note also that:

  • The Map classes use them to determine if a key is already present.
  • The Set classes use them to determine if an object is already in the set.
  • All collections classes use them in contains and related methods.

Oracle specifies that the behavior of hashCode should be "consistent with equals", meaning that two objects that compare as equal should return the same hash code. The reverse is not required; two objects with the same hash code might be unequal, since hash codes provide "bins" for storing objects.

This is critical when writing collectible classes. For example, the implementation of HashSet, which should reject duplicate entries, compares a candidate entry's hashcode against that of each object currently in the collection. It will only call equals if it finds a matching hash code. If no hash code matches, it assumes that the candidate object must not match any already present in the set.

Comparable and Comparator

Sorted collections can sort elements in two ways:

  1. By the natural order of the elements - objects that implement the Comparable interface have a natural order.
  2. By using a third-party class that implements Comparator.

Comparable specifies one method: compareTo(Object o) returns a negative integer, zero, or a positive integer as this object is less than, equal to, or greater than the specified object.

Comparator specifies two methods:

  1. compare(Object a, Object b) returns a negative integer, zero, or a positive integer if a is less than, equal to, or greater than b.
  2. equals(Object o), as in Object, is used not to compare the stored values for value, but to determine if two comparator classes can be considered equal. However, note that the behavior of the compare method should be consistent with equals, as Oracle's documentation advises; that is, compare(a, b) should return 0 when a.equals(b) returns true

As an example: TreeSet uses a tree structure to store items. Tthe tree part of the name is just for identifying the algorithm used for storage; you cannot make use of any of the node-related behaviors from the outside. There are several forms of constructors, most notably:

  • TreeSet() uses the natural order of the items, so they must implement the Comparable interface (and the items should be mutually comparable, to avoid casting exceptions being thrown during a comparison).
  • TreeSet(Comparator c) - uses the specified Comparator to compare the items for ordering (and, again, the mutually comparable caveat applies).

Code Sample:

import java.util.*;

public class UseComparable {
  public static void main(String[] args)
                     throws java.io.IOException {
    String[] names = { "Sue", "Bill", "Tom", "Dave", "Andy",
                       "Mary", "Beth", "Bill", "Mike" };
    TreeSet sl = new TreeSet(Arrays.asList(names));
    Iterator it = sl.iterator();
    while (it.hasNext()) {

Since String implements Comparable, the names will appear in alphabetical order when we iterate through the set.

The Arrays class provides useful methods for working with arrays, some of which will return a collection backed by the array (the Collections classes contain useful methods for working with collections, some of which perform the reverse operation).

Creating a Comparable Class

Classes that implement the Comparable interface may be stored in ordered collections. They must have a int compareTo(Object other) method to implement the interface. Note the following:

  • It is said that these objects have a natural order as determined by this method.
  • The function returns a negative value for this object considered less than the object passed as parameter other, a positive value for this object considered greater than the object passed as parameter other, and 0 if they are equal.
  • Since this capability is built into the object, it is important that it the results of the compareTo method match with the results of the equals and hashCode methods.

Creating and Using a Comparator Class

Classes that implement the Comparator interface may be also used with ordered collections, but the collection must be constructed with an explicit reference to an instance of the Comparator. The Comparator is a separate class that will compare two instances of your class to determine the ordering.

The interface specifies int compare(Object a, Object b). The function returns a negative value for an object considered less than the object b, a positive value for b considered greater than a, and 0 if they are equal.

It is still important that the results of the compareTo method match with the results of the objects' equals method. Note that you should usually implement this method to avoid "ties" for objects that would not be considered equal. For example, for two different employees who coincidentally have the same name would be returned in an indeterminate order.

Code Sample:

import java.util.*;

public class UseComparator {
  public static void main(String[] args) throws java.io.IOException {
    String[] names = { "Sue", "Bill", "Tom", "Dave", "Andy",
                       "Mary", "Beth", "Bill", "Mike" };

    TreeSet s2 = new TreeSet(new ReverseComparator());
    Iterator it = s2.iterator();
    while (it.hasNext()) {

class ReverseComparator implements Comparator {
  public int compare(Object o1, Object o2) {
    if (o1 instanceof String && o2 instanceof String)
      return -((String)o1).compareTo((String)o2);
    else throw new ClassCastException("Objects are not Strings");

The compare method of our comparator makes use of the existing compareTo method in String and simply inverts the result. Note that the objects being compared are tested to make sure both are String objects. If the test fails, the method throws a ClassCastException.

Code Sample:

import java.util.*;

public class UseComparableAndComparator {
  public static void main(String[] args) throws java.io.IOException {
    String[] names = { "Sue", "Bill", "Tom", "Dave", "Andy",
                       "Mary", "Beth", "Bill", "Mike" };

    TreeSet sl = new TreeSet(Arrays.asList(names));
    Iterator it = sl.iterator();
    while (it.hasNext()) {

    TreeSet s2 = new TreeSet(new ReverseComparator());
    it = s2.iterator();
    while (it.hasNext()) {

Since all the collections store are references, it will not use a lot of memory to store the same references in different collections. This creates an analog to a set of table indexes in a database

Collections in Java 5.0: Generics

Java 5.0 added the concept of Generics, which allow data types to be parameterized for a class

In earlier versions of Java, the collection methods to store objects all received a parameter whose type was Object. Therefore, the methods to retrieve elements were typed to return Object. To use a retrieved element, you had to typecast the returned object back to whatever it actually was (and somehow you had to know what it actually was).

The collections in Java 5.0 use a special, new syntax where the type of object is stated in angle brackets after the collection class name.

Instead of ArrayList, there is now ArrayList<E>, where the E can be replaced by any type. Within the class, method parameters and return values can be parameterized with the same type.

public class ArrayList<E> {
	. . .
	public void add(int index, E element) { ... }
	. . .
	public E get(int index) {  }
	. . .

For example, an ArrayList of String objects would be ArrayList<String>.

Code Sample:

import java.util.*;

public class GenericCollectionsTest {
  public static void main(String[] args) {

    List<String> ls = new ArrayList<String>();
    // using iterator
    StringBuffer result = new StringBuffer();   
    Iterator<String> is = ls.iterator();
    while (is.hasNext()) 
      result.append(is.next().toUpperCase()).append(' ');
    // using for-each loop
    result = new StringBuffer();
    for (String s : ls) result.append(s.toLowerCase()).append(' ');

    // old way
    List l = new ArrayList();
    // using iterator
    result = new StringBuffer();
    Iterator i = l.iterator();
    while (i.hasNext()) 
      result.append(((String)i.next()).toUpperCase()).append(' ');
    // using for-each loop
    result = new StringBuffer();
    for (Object o : l) 
      result.append(((String)o).toLowerCase()).append(' ');


As you can see, the objects retrieved from the ArrayList are already typed as being String objects. Note the following:

  • We need to have them as String objects in order to call toUpperCase, whereas our previous examples only printed the objects, so being typed as Object was okay.
  • Without generics, we must typecast each retrieved element in order to call toUpperCase.
  • To use an iterator, we would declare the variable to use the same type of element as the collection we draw from, as in Iterator<String>.

Bounded Types

A type parameter may set bounds on the type used, by setting an upper limit (in inheritance diagram terms) on the class used. The extends keyword is used to mean that the class must either be an instance of the specified boundary class, or extend it, or, if it is an interface, implement it:

public class EmployeeLocator<T extends Employee> { . . . }
public class CheckPrinter<T extends Payable> { . . . }

In the first case, the class may be parameterized with Employee, or any class that extends Employee. In the second, the class may be parameterized with Payable or any type that implements the Payable interface.

Extending Generic Classes and Implementing Generic Interfaces

When extending a generic class or implementing a generic interface, you can maintain the generic type, as in public class ArrayList<T> implements List<T>. In this case, types are still stated in terms of T.

You can lock in the generic type: public class EmployeeList extends ArrayList<Employee> or public class StringList implements java.util.List<String>. In these cases, methods would use the fixed type. For example, if you overrode add(E) in ArrayList<E> in the above EmployeeList, it would be add(Employee).

Generic Methods

Methods may be generic, whether or not they are in a generic class. The syntax is somewhat ugly, since it requires listing the type variable before the return type and requires that at least one parameter to the method be of the generic type (that is how the compiler knows what the type is).

public <T> T chooseRandomItem(T[] items) {
	Random r = new Random();
	return items[r.nextInt(items.length)];

The above method is parameterized with type T. The type for T is established by whatever type of array is passed in; if we pass in a String array, then T is String. The method will then randomly pick one to return.

The type may be bounded with extends.

Variations on Generics - Wildcards

In the documentation for a collections class, you may see some strange type parameters for methods or constructors, such as:

public boolean containsAll(Collection<?> c)
public boolean addAll(Collection<? extends E> c) 
public TreeSet(Comparator<? super E> comparator)

The question mark is a wildcard, indicating that the actual type is unknown. But, we at least know limits in the second two cases. The extends keyword in this usage actually means "is, extends, or implements" (which is the same criteria the instanceof operator applies). The super keyword means essentially the opposite: that the type parameter of the other class is, or is more basic than, this class's type. The usages with extends and super are called bounded wildcards.

This syntax only occurs when the variable is itself a generic class. The wildcards then state how that class's generic type relates to this class's type.

Why this is necessary leads down a long and winding path. To start, consider the following:

List<Employee> exEmps = new ArrayList<ExemptEmployee>();

This seems reasonable at first glance, but then consider if this line followed:

exEmps.add(new ContractEmployee());

Perfectly legal as far as the compiler is concerned, since ContractEmploye fits within the Employee type that the exEmps variable requires, but now we have a contract employee in a list instance that is supposed to hold only exempt employees. So, an instance of a class parameterized with a derived class is not an instance of that class parameterized with the base class, even though individual instances of the derived class can be used in the base-parameterized generic class; e.g., our List<Employee> can add individual exempt employees.

For the first wildcard case above, public boolean containsAll(Collection<?> c), it does no harm for us to see if our collection contains all the elements of some other collection that may contain an entirely unrelated type (but few, if any, of the items would compare as equal). Note that the contains method accepts an Object parameter, not an E, for this same reason.

The extends term in public boolean addAll(Collection<? extends E> c)means that the unknown class is, extends, or implements the listed type. For instance, we could add all the elements of an ArrayList<ExemptEmployee> to an ArrayList<Employee>. That makes sense, since we could add individual exempt employees to a basic employee collection. But, since we don't actually know what the parameterized type of the incoming collection is (it is the ? class), we cannot call any methods on that object that depend on its parameterized type. So we can't add to that collection; we can only read from it.

The super term is seen less often; it means that the parameterized type of the incoming collection must be of the same type or a more basic type. The TreeSet constructor can accept a Comparator for its actual type, or any type more basic (e.g., a Comparator<Object> can be used for a TreeSet of anything, since its compare method will accept any type of data). It is, however, likely that many "acceptable" comparators will end up throwing a ClassCastException at runtime if they can't actually compare the types involved. So, for example, if we had a Comparator<Employee> class that compared employee ids, we might still wish to use it in a TreeSet<ExemptEmployee>, where it would be perfectly valid (in fact, it would be an annoyance to have to write a special comparator for every employee type, if all the comparators did was compare the ids).

Payroll Using Generics

Duration: 15 to 25 minutes.

We can modify our payroll application to use generic lists instead of arrays for our employees, invoices, and payables.

  1. Make the Employee[] variable a List<Employee>, and populate it with a new ArrayList<Employee>.
  2. Since lists have no fixed size, you will need to change the first for loop, perhaps to a fixed number of iterations.
  3. Modify the lines that add employees from using array indices to using the add method.
  4. Turn the Invoice[] into a List<Invoice> populated with a Vector<Invoice>, similar to what we did in we did in step 1.
  5. Modify the lines that populate that list.
  6. Create an ArrayList<Payable> in a List<Payable> payables variable.
  7. Create an ArrayList<Payable> in a List<Payable> payments variable.
  8. Then call that list's addAll method once and pass in the employee list.
  9. Then add all the invoices the same way.
  10. Since we might not want to modify CheckPrinter, you can generate a Payable array for it with:
    CheckPrinter.printChecks( payments.toArray(new Payable[] { }) ); 
    The parameter is a "dummy" array used to tell the generic method what type of array to create. Note in the Collection documentation that this method is typed with T rather than the E used in the rest of the class. This method has its own local type, which is determined by the type of the array passed in.


import employees.*;
import vendors.*;
import util.*;
import finance.*;
import java.util.*;

public class Payroll {	
	public static void main(String[] args) {
		List<Employee> e = new ArrayList<Employee>();
		Employee empl = null;
		String fName = null;
		String lName = null;
		int dept = 0;
		double payRate = 0.0;
		double hours = 0.0;

		for (int i = 0; i < 5; i++) {
			try {
 				char type = 
					KeyboardReader.getPromptedChar("Enter type: E, N, or C: ");
				if (type != 'e' && type != 'E' && 
						type != 'n' && type != 'N' && 
						type != 'c' && type != 'C') {
					System.out.println("Please enter a valid type");
				fName = KeyboardReader.getPromptedString("Enter first name: ");
				lName = KeyboardReader.getPromptedString("Enter last name: ");
	      dept = KeyboardReader.getPromptedInt(
	      			"Enter department: ", "Department must be numeric",
	      			new DeptValidator(), "Valid departments are 1 - 5");
				do {
					payRate = KeyboardReader.getPromptedDouble("Enter pay rate: ", 
									"Pay rate must be numeric");
					if (payRate < 0.0) System.out.println("Pay rate must be >= 0");
				} while (payRate < 0.0);
				switch (type) {
					case 'e':
					case 'E': 
						empl = new ExemptEmployee(fName, lName, dept, payRate);
					case 'n':
					case 'N': 
						do {
							hours = KeyboardReader.getPromptedDouble("Enter hours: ");
							if (hours < 0.0) 
								System.out.println("Hours must be >= 0");
						} while (hours < 0.0);
						empl = new NonexemptEmployee(fName,lName,dept,payRate,hours);
					case 'c':
					case 'C': 
						do {
							hours = KeyboardReader.getPromptedDouble("Enter hours: ");
							if (hours < 0.0) 
								System.out.println("Hours must be >= 0");
						} while (hours < 0.0);
						empl = new ContractEmployee(fName,lName,dept,payRate,hours);
			} catch (InvalidValueException ex) {
					i--;	//failed, so back up counter to repeat this employee
		System.out.println("Exempt Employees");
		System.out.println("====== =========");
		for (Employee emp : e) {
			if (emp instanceof ExemptEmployee) {
		System.out.println("Nonexempt Employees");
		System.out.println("======== =========");
		for (Employee emp : e) {
			if (emp instanceof NonexemptEmployee) {
		System.out.println("Contract Employees");
		System.out.println("======== =========");
		for (Employee emp : e) {
			if (emp instanceof ContractEmployee) {

		List<Invoice> inv = new Vector<Invoice>();
		inv.add(new Invoice("ABC Co.", 456.78));
		inv.add(new Invoice("XYZ Co.", 1234.56));
		inv.add(new Invoice("Hello, Inc.", 999.99));
		inv.add(new Invoice("World, Ltd.", 0.43));

		List<Payable> payments = new ArrayList<Payable>();

		CheckPrinter.printChecks( payments.toArray(new Payable[] { }) );

Notice that we didn't have to change the for-each loops; they work equally well for Iterable objects, like lists, as they do for arrays.

Also, the addAll method accepts any Collection, so it can take an ArrayList or a Vector equally well. And, since it accepts Collection<? extends E>, it can take collections of either Employee or Invoice (both implement the Payable type used for E in the the receiving collection).