Published on ONJava.com (http://www.onjava.com/)
 See this if you're having trouble printing code examples

The Hidden Gems of Jakarta Commons, Part 1

by Timothy M. O'Brien

If you are not familiar with the Jakarta Commons, you have likely reinvented a few wheels. Before you write any more generic frameworks or utilities, grok the Commons. It will save you serious time. Too many people write a StringUtils class that duplicates methods available in Commons Lang's StringUtils, or developers unknowingly recreate the utilities in Commons Collections even though commons-collections.jar is already available in the classpath. Seriously, take a break. Check out the Commons Collections API and then go back to your task; I promise you'll find something simple that will save you a week over the next year. If people just took some time to look at Jakarta Commons, we would have much less code duplication--we'd start making good on the real promise of reuse. I've seen it happen; somebody digs into Commons BeanUtils or Commons Collections and invariably they have a "Oh, if I had only known about this, I wouldn't have written 10,000 lines of code" moment. There are still parts of Jakarta Commons that remain a mystery to most; for instance, many have yet to hear of Commons CLI or Commons Configuration, and most have yet to notice the valuable functors package in Commons Collections. In this series, I emphasize some of the less-appreciated tools and utilities in the Jakarta Commons.

In this first part of the series, I explore XML rule set definitions in the Commons Digester, functors available in Commons Collections, and an interesting application, Commons JXPath, to query a List of objects. Jakarta Commons contains utilities that aim to help you solve problems at the lowest level of programming: iterating over collections, parsing XML, and selecting objects from a List. I would encourage you to spend some time focusing on these small utilities, as learning about the Jakarta Commons will save you a substantial amount of time. It isn't simply about using Commons Digester to parse XML or using CollectionUtils to filter a collection with a Predicate. You will start to see benefits once you realize how to combine the power of these utilities and how to relate Commons projects to your own applications; once this happens, you will come to see commons-lang.jar, commons-beanutils.jar, and commons-digester.jar as just as indispensable to any system as the JVM itself.

Related Reading

Jakarta Commons Cookbook
By Timothy M. O'Brien

If you are interested in learning more about the Jakarta Commons, check out the Jakarta Commons Cookbook. This book is full of recipes that will get you hooked on the Commons, and tells you how to use Jakarta Commons in concert with other small open source components such as Velocity, FreeMarker, Lucene, and Jakarta Slide. In this book, I introduce a wide array of tools from Jakarta Commons from using simple utilities in Commons Lang to combining Commons Digester, Commons Collections, and Jakarta Lucene to search the works of William Shakespeare. I hope this series and the Jakarta Commons Cookbook provide you with some interesting solutions for low-level programming problems.

1. XML-Based Rule Sets for Commons Digester

Commons Digester 1.6 provides one of the easiest ways to turn XML into objects. Digester has already been introduced on the O'Reilly network in two articles: "Learning and Using Jakarta Digester," by Philipp K. Janert, and "Using the Jakarta Commons, Part 2," by Vikram Goyal. Both articles demonstrate the use of XML rule sets, but this idea of defining rule sets in XML has not caught on. Most sightings of the Digester appear to define rule sets programmatically, in compiled code. You should avoid hard-coding Digester rule sets in compiled Java code when you have the opportunity to store such mapping information in an external file or a classpath resource. Externalizing a Digester rule set makes it easier to adapt to an evolving XML document structure or an evolving object model.

To demonstrate the difference between defining rule sets in XML and defining rule sets in compiled code, consider a system to parse XML to a Person bean with three properties--id, name, and age, as defined in the following class:

package org.test;

public class Person {
  public String id;
  public String name;
  public int age;
  public Person() {}

  public String getId() { return id; }
  public void setId(String id) { 
    this.id = id;

  public String getName() { return name; }
  public void setName(String name) {
    this.name = name;

  public int getAge() { return age; }
  public void setAge(int age) {
    this.age = age;

Assume that your application needs to parse an XML file containing multiple person elements. The following XML file, data.xml, contains two person elements that you would like to parse into Person objects:

  <person id="1">
    <name>Tom Higgins</name>
  <person id="2">
    <name>Barney Smith</name>
  <person id="3">
    <name>Susan Shields</name>

You expect the structure and content of this XML file to change over the next few months, and you would prefer not to hard-code the structure of the XML document in compiled Java code. To do this, you need to define Digester rules in an XML file that is loaded as a resource from the classpath. The following XML document, person-rules.xml, maps the person element to the Person bean:

  <pattern value="people/person">
    <object-create-rule classname="org.test.Person"/>
    <set-next-rule methodname="add" 
    <bean-property-setter-rule pattern="name"/>
    <bean-property-setter-rule pattern="age"/>

All this does is instruct the Digester to create a new instance of Person every time it encounters a person element, call add() to add this Person to an ArrayList, set any bean properties that match attributes on the person element, and set the name and age properties from the sub-elements name and age. You've seen the Person class, the XML document to be parsed, and the Digester rule definitions in XML form. Now you need to create an instance of Digester with the rules defined in person-rules.xml. The following code creates a Digester by passing the URL of the person-rules.xml resource to the DigesterLoader. Since the person-rules.xml file is a classpath resource in the same package as the class parsing the XML, the URL is obtained with a call to getClass().getResource(). The DigesterLoader then parses the rule definitions and adds these rules to the newly created Digester:

import org.apache.commons.digester.Digester;
import org.apache.commons.digester.xmlrules.DigesterLoader;

// Configure Digester from XML ruleset
URL rules = getClass().getResource("./person-rules.xml");
Digester digester = 

// Push empty List onto Digester's Stack
List people = new ArrayList();
digester.push( people );

// Parse the XML document
InputStream input = new FileInputStream( "data.xml" );
digester.parse( input );

Once the Digester has parsed the XML in data.xml, three Person objects should be in the people ArrayList.

The alternative to defining Digester rules in XML is to add them using the convenience methods on a Digester instance. Most articles and examples start with this method, adding rules using the addObjectCreate() and addBeanPropertySetter() methods on Digester. The following code adds the same rules that were defined in person-rules.xml:


If you have ever found yourself working at an organization with 2500-line classes to parse a huge XML document with SAX, or a whole collection of classes to work with DOM or JDOM, you understand that XML parsing is more complex than it needs to be, in the majority of cases. If you are building a highly efficient system with strict speed and memory requirements, you need the speed of a SAX parser. If you need the complexity of the DOM Level 3, use a parser like Apache Xerces. But if you are simply trying to parse a few XML documents into objects, take a look at Commons Digester, and define your rule set in an XML file.

Any time you can move this type of configuration outside of compiled code, you should. I would encourage you to define your digester rules in an XML file loaded either from the file system or the classpath. Doing so will make it easier to adapt your program to changes in the XML document and changes in your object model. For more information on defining Digester rules in an XML file, see Section 6.2 of the Jakarta Commons Cookbook, "Turning XML Documents into Objects."

2. Functors in Commons Collections

Functors are an interesting part of Commons Collections 3.1 for two reasons: they haven't received the attention they warrant, and they have the potential to change the way you approach programming. Functor is just a fancy name for an object that encapsulates a function--a "functional object." And while they are certainly not the same thing, if you have ever used method pointers in C or C++, you'll understand the power of functors. A functor is an object--a Predicate, a Closure, or a Transformer. Predicates evaluate objects and return a boolean, Transformers evaluate objects and return new objects, and Closures accept objects and execute code. Functors can be combined into composite functors that model loops, logical expressions, and control structures, and functors can also be used to filter and operate upon items in a collection.

Explaining functors in an article as short as this may be impossible, so to "jump start" your introduction to functors, I will solve the same problem both with and without functors. In this example, Student objects from an ArrayList are sorted into two List instances if they meet certain criteria; students with straight-A grades are added to an honorRollStudents list, and students with Ds and Fs are added to a problemStudents list. After the students are separated, the system will iterate through each list, giving the honor-roll students an award and scheduling a meeting with parents of problem students. The following code implements this process without the use of functors:

List allStudents = getAllStudents();

// Create 2 ArrayLists to hold honorRoll students
// and problem students
List honorRollStudents = new ArrayList();
List problemStudents = new ArrayList();

// Iterate through all students.  Put the
// honorRoll students in one List and the
// problem students in another.
Iterator allStudentsIter = allStudents.iterator();
while( allStudentsIter.hasNext() ) {
  Student s = (Student) allStudentsIter.next();

  if( s.getGrade().equals( "A" ) ) {
    honorRollStudents.add( s );
  } else if( s.getGrade().equals( "B" ) && 
             s.getAttendance() == PERFECT) {
    honorRollStudents.add( s );
  } else if( s.getGrade().equals( "D" ) || 
             s.getGrade().equals( "F" ) ) {
    problemStudents.add( s );
  } else if( s.getStatus() == SUSPENDED ) {
    problemStudents.add( s );

// For all honorRoll students, add an award and
// save to the Database.
Iterator honorRollIter = 
while( honorRollIter.hasNext() ) {
  Student s = (Student) honorRollIter.next();
  // Add an award to student record
  s.addAward( "honor roll", 2005 );
  Database.saveStudent( s );

// For all problem students, add a note and 
// save to the database.
Iterator problemIter = problemStudents.iterator();
while( problemIter.hasNext() ) {
  Student s = (Student) problemIter.next();

  // Flag student for special attention
  s.addNote( "talk to student", 2005 );
  s.addNote( "meeting with parents", 2005 );
  Database.saveStudent( s );

The previous example is very procedural; the only way to figure out what happens to a Student object is to step through each line of code. The first half of this example is decision logic that applies tests to each Student object and classifies students based on performance and attendance. The second half of this example operates on the Student objects and saves the result to the database. A 50-line method body like the previous example is how most systems begin--manageable procedural complexity. But problems start to appear when the requirements start to shift. As soon as that decision logic changes, you will need to start adding more clauses to the logical expressions in the first half of the previous example. For example, what happens to your logical expression if a student is classified as a problem if he has a B and perfect attendance, but attended detention more than five times? Or what happens to the second half, when a student can be on the honor roll only if they were not a problem last year? When exceptions and requirement changes start to affect procedural code, manageable complexity turns into unmaintainable spaghetti code.

Step back from the previous example and consider what that code was doing. It was looking at every object in a List, applying a criteria, and, if that criteria was satisfied, acting upon an object. A critical improvement that could be made to the previous example is the decoupling of the criteria from the code that acts upon an object. The following two code excerpts solve the previous problem in a very different way. First, the criteria for the honor roll and problem students are modeled by two Predicate objects, and the code that acts upon honor roll and problem students is modeled by two Closure objects. These four objects are defined below:

import org.apache.commons.collections.Closure;
import org.apache.commons.collections.Predicate;

// Anonymous Predicate that decides if a student 
// has made the honor roll.
Predicate isHonorRoll = new Predicate() {
  public boolean evaluate(Object object) {
    Student s = (Student) object;

    return( ( s.getGrade().equals( "A" ) ) ||
            ( s.getGrade().equals( "B" ) && 
              s.getAttendance() == PERFECT ) );

// Anonymous Predicate that decides if a student
// has a problem.
Predicate isProblem = new Predicate() {
  public boolean evaluate(Object object) {
    Student s = (Student) object;

    return ( ( s.getGrade().equals( "D" ) || 
               s.getGrade().equals( "F" ) ) ||
             s.getStatus() == SUSPENDED );

// Anonymous Closure that adds a student to the 
// honor roll
Closure addToHonorRoll = new Closure() {
  public void execute(Object object) {
    Student s = (Student) object;
    // Add an award to student record
    s.addAward( "honor roll", 2005 );
    Database.saveStudent( s );

// Anonymous Closure flags a student for attention
Closure flagForAttention = new Closure() {
  public void execute(Object object) {
    Student s = (Student) object;
    // Flag student for special attention
    s.addNote( "talk to student", 2005 );
    s.addNote( "meeting with parents", 2005 );
    Database.saveStudent( s );

The four anonymous implementations of Predicate and Closure are separated from the system as a whole. flagForAttention has no knowledge of what the criteria are for a problem student, and the isProblem Predicate only knows how to identify a problem student. What is needed is a way to marry the right Predicate with the right Closure, and this is shown in the following example.

import org.apache.commons.collections.ClosureUtils;
import org.apache.commons.collections.CollectionUtils;
import org.apache.commons.collections.functors.NOPClosure;

Map predicateMap = new HashMap();

predicateMap.put( isHonorRoll, addToHonorRoll );
predicateMap.put( isProblem, flagForAttention );
predicateMap.put( null, ClosureUtils.nopClosure() );

Closure processStudents = 
    ClosureUtils.switchClosure( predicateMap );

CollectionUtils.forAllDo( allStudents, processStudents );

In the previous code, the predicateMap matches Predicates to Closures; if a Student satisfies the Predicate in the key, it will be passed to the Closure in the value. By supplying a NOPClosure value and a null key, we will pass Student objects that satisfy neither Predicate to a "do nothing" or "no operation" NOPClosure created by a call to ClosureUtils. A SwitchClosure, processStudents, is created from the predicateMap, and the processStudents Closure is applied to every Student object in the allStudents using CollectionUtils.forAllDo(). This is a very different approach; notice that you are not iterating through any lists. Instead, you set rules and consequences and CollectionUtils and SwitchClosure take care of the execution.

When you separate criteria using Predicates and actions using Closures, your code is less procedural and much easier to test. The isHonorRoll Predicate can be unit tested in isolation from the addToHonorRoll Closure, and both can be tested by supplying a mock instance of the Student class. The second example also demonstrates CollectionUtils.forAllDo(), which applies a Closure to every element in a Collection. You may have noticed that using functors did not reduce the line count; in fact, the use of functors increased the line count. But the real benefit from functors is the modularity and encapsulation of criteria and actions. If your method length tends towards hundreds of lines, consider an less procedural, more object-oriented approach--use a functor.

Chapter 4, "Functors," in the Jakarta Commons Cookbook introduces functors available in Commons Collections, and Chapter 5, "Collections," shows you how to use functors with the Java Collections API. All of the functors--Closure, Predicate, and Transformer--can be combined into composite functors that can be used to model any kind of logic. switch, while, and for structures can be modeled with SwitchClosure, WhileClosure, and ForClosure. Compound logical expressions can be constructed from multiple Predicates using OrPredicate, AndPredicate, AllPredicate, and NonePredicate, among others. Commons BeanUtils also contains functor implementations that are used to apply functors to bean properties--BeanPredicate, BeanComparator, and BeanPropertyValueChangeClosure. Functors are a different way of thinking about low-level application architecture, and they could very well change your approach to coding.

3. Using XPath Syntax to Query Objects and Collections

Commons JXPath is a surprising (non-standard) use of an XML standard. XPath has been around for some time as a way to select a node or node set in an XSL style sheet. If you've worked with XML, you are probably familiar with the syntax /foo/bar that selects the bar sub-elements of the foo document element. Jakarta Commons JXPath adds an interesting twist: you can use JXPath to select objects from beans and collections, among other object types such as servlet contexts and DOM Document objects. Consider a List of Person objects. Each Person object has a bean property of the type Job, and each Job object has a salary property of the type int. Person objects also have a country property, which is a two-letter country code. Using JXPath, it is easy to select all Person objects with a US country and a Job that pays more than one million dollars. Here is some code to set up a List of beans to filter with JXPath:

// Person's constructor sets firstName and country
Person person1 = new Person( "Tim", "US" );
Person person2 = new Person( "John", "US" );
Person person3 = new Person( "Al",  "US" );
Person person4 = new Person( "Tony", "GB" );

// Job's constructor sets name and salary
person1.setJob( new Job( "Developer", 40000 ) );
person2.setJob( new Job( "Senator", 150000 ) );
person3.setJob( new Job( "Comedian", 3400302 ) );
person4.setJob( new Job( "Minister", 2000000 ) );

Person[] personArr = 
  new Person[] { person1, person2, 
                 person3, person4 };

List people = Arrays.asList( personArr );

The people List contains four Person beans: Tim, John, Al, and George. Tim is a developer who makes $40,000, John is a Senator who makes $150,000, Al is a comedian who walks home with $3.4 million, and Tony is a prime minister who makes 2 million euros. Our task is simple: iterate over this List and print the name of every Person who is a U.S. citizen making over one million dollars. Assume that people is an ArrayList of Person objects, and take a look at the solution without the benefit of JXPath:

Iterator peopleIter = people.getIterator();
while( peopleIter.hasNext() ) {
  Person person = (Person) peopleIter.next();

  if( person.getCountry() != null &&
      person.getCountry().equals( "US" ) &&
      person.getJob() != null &&
      person.getJob().getSalary() > 1000000 ) {
        print( person.getFirstName() + " "
               person.getLastName() );

The previous example is heavy, and somewhat error-prone. To find the matching Person objects, you first need to iterate over each Person and test the country property of each. If the country property is not null and it has the correct value, then you must test the job property to find out if it is non-null and has salary property greater than 1000000. The line count of the previous example can be dramatically reduced with Java 1.5's for syntax, but, even with Java 1.5, you still need to perform two comparisons at two different levels.

What if you had to write a number of these queries against a set of Person objects stored in memory? What if your application had to display all of the Person objects in England named Tony? Or, what if you had to print the name of every Job with a salary less than 20,000? If you were storing these objects in a relational database, you could solve this by writing a SQL query, but if you are dealing with objects in memory, you don't have this luxury. While XPath was primarily meant for XML, you could use it to write "queries" against a collection of objects, treating objects as elements and bean properties as sub-elements. Yes, this is a strange application of XPath, but take a look at how the following example performs three different queries against people, an ArrayList of Person objects.

import org.apache.commons.jxpath.JXPathContext;

public List queryCollection(String xpath,
                            Collection col) {
    List results = new ArrayList();

    JXPathContext context = 
        JXPathContext.newContext( col );
    Iterator matching = 
        context.iterate( xpath );

    while( matching.hasNext() ) {
        results.add( matching.getNext() );
    return results;

String query1 =
   ".[@country = 'US']/job[@salary > 1000000]/..";  
String query2 =
   ".[@country = 'GB' and @name = 'Tony']";  
String query3 = 

List richUsPeople = 
    queryCollection( query1, people );
List britishTony = 
    queryCollection( query2, people );
List jobNames = 
    queryCollection( query3, people );

The method queryCollection() takes an XPath expression and applies it to a Collection. XPath expressions are evaluated against a JXPathContext, which is created by calling JXPathContext.newContext() and passing in the Collection to be queried. Calling context.iterate() then applies the XPath expression to each item in the Collection, returning an Iterator with every matching "node" (or in this case, "object"). The first query performed by the previous example, query1, is same query from the original example implemented without JXPath. query2 selects all Person objects with a country property of GB and a name property of Tony, and query3 selects a List of String objects, the name property of all of the Job objects.

When I first saw Commons JXPath, it struck me as a bad idea. Why apply XPath expressions to objects? Something about it didn't feel right. But this unexpected use of XPath as a query language for a collection of beans has come in handy for me more than a few times in the past few years. If you find yourself looping through lists to find matching elements, consider using JXPath. For more information, see Chapter 12, "Searching and Filter," of Jakarta Commons Cookbook, which discusses Commons JXPath and Jakarta Lucene paired with Commons Digester.

And There's More

Stay tuned to this exploration of the far reaches of the Jakarta Commons. In the next part of this series, I'll introduce some related tools and utilities. Set operations in Commons Collections, using Predicate objects with collections, configuring an application with Commons Configuration, and using Commons Betwixt to read and write XML. There is much to be gained from the Jakarta Commons that cannot be conveyed in a few thousand words, and I would encourage you to take a look at the Jakarta Commons Cookbook. Many of these utilities may, at first glance, seem somewhat trivial, but the power of Jakarta Commons lies in how these tools can be combined with each other and integrated into your own systems.

Timothy M. O'Brien is a developer and entrepreneur living in Chicago, IL. He spends his days programming in Java, Python, and Ruby.

Return to ONJava.com.

Copyright © 2009 O'Reilly Media, Inc.