ONJava.com -- The Independent Source for Enterprise Java
oreilly.comSafari Books Online.Conferences.


AddThis Social Bookmark Button

XML Basics for Java Developers, Part 2
Pages: 1, 2

SAX model builder

Now let's get down to business and write our builder tool. The SAXModelBuilder we create in this section receives SAX events from parsing an XML file and constructs classes corresponding to the names of the tags. Our model builder is simple, but it handles the most common structures: elements with text or simple element data. We handle attributes by passing them to the model class, allowing it to map them to fixed identifiers (e.g., Animal.MAMMAL). Here is the code:

import org.xml.sax.*;
import org.xml.sax.helpers.*;
import java.util.*;
import java.lang.reflect.*;

public class SAXModelBuilder extends DefaultHandler
    Stack stack = new Stack(  );
    SimpleElement element;

    public void startElement(
        String namespace, String localname, String qname, Attributes atts ) 
      throws SAXException
      SimpleElement element = null;
        try {
            element = (SimpleElement)Class.forName(qname).newInstance(  );
        } catch ( Exception e ) {/*No class for element*/}
        if ( element == null ) 
           element = new SimpleElement(  );
        for(int i=0; i<atts.getLength(  ); i++)
           element.setAttributeValue( atts.getQName(i), atts.getValue(i) );
        stack.push( element );
   public void endElement( String namespace, String localname, String qname) 
      throws SAXException
      element = (SimpleElement)stack.pop(  );
      if ( !stack.empty(  ) )
         try {
            setProperty( qname, stack.peek(  ), element );
         } catch ( Exception e ) { throw new SAXException( "Error: "+e ); }
   public void characters(char[] ch, int start, int len ) {
      String text = new String( ch, start, len );
      ((SimpleElement)(stack.peek(  ))).addText( text );

    void setProperty( String name, Object target, Object value ) 
      throws SAXException 
      Method method = null;
      try { 
         method = target.getClass(  ).getMethod( 
            "add"+name, new Class[] { value.getClass(  ) } );
      } catch ( NoSuchMethodException e ) { }
      if ( method == null ) try { 
         method = target.getClass(  ).getMethod( 
            "set"+name, new Class[] { value.getClass(  ) } );
      } catch ( NoSuchMethodException e ) { }
      if ( method == null ) try { 
         value = ((SimpleElement)value).getText(  );
         method = target.getClass(  ).getMethod( 
            "add"+name, new Class[] { String.class } );
      } catch ( NoSuchMethodException e ) { }
      try {
         if ( method == null )
            method = target.getClass(  ).getMethod( 
               "set"+name, new Class[] { String.class } );
         method.invoke( target, new Object [] { value } );
      } catch ( Exception e ) { throw new SAXException( e.toString(  ) ); }
   public SimpleElement getModel(  ) { return element; }

The SAXModelBuilder extends DefaultHandler to help us implement the ContentHandler interface. We use the startElement(), endElement(), and characters() methods to receive information from the document.

Because SAX events follow the structure of the XML document, we use a simple stack to keep track of which object we are currently parsing. At the start of each element, the model builder attempts to create an instance of a class with the same name and push it onto the top of the stack. Each nested opening tag creates a new object on the stack until we encounter a closing tag. Upon reaching an end of the element, we pop the current object off the stack and attempt to apply its value to its parent (the enclosing element), which is the new top of the stack. The final closing tag leaves the stack empty, but we save the last value in the result variable.

Our setProperty() method uses reflection and the standard JavaBeans naming conventions to look for the appropriate property "setter" method to apply a value to its parent object. First we check for a method named add<Property> or set<Property>, accepting an argument of the child element type (for example, the addAnimal( Animal animal ) method of our Inventory object). Failing that, we look for an "add" or "set" method accepting a String argument and use it to apply any text content of the child object. This convenience saves us from having to create trivial classes for properties containing only text.

The common base class SimpleElement helps us in two ways. First, it provides a method allowing us to pass attributes to the model class. Next, we use SimpleElement as a placeholder when no class exists for an element, allowing us to store the text of the tag.

Test drive

Finally, we can test-drive the model builder with the following class, TestModelBuilder, which calls the SAX parser, setting an instance of our SAXModelBuilder as the content handler. The test class then prints some of the information parsed from the zooinventory.xml file:

import org.xml.sax.*;
import org.xml.sax.helpers.*;
import javax.xml.parsers.*;

public class TestModelBuilder 
   public static void main( String [] args ) throws Exception
      SAXParserFactory factory = SAXParserFactory.newInstance(  );
      SAXParser saxParser = factory.newSAXParser(  );
      XMLReader parser = saxParser.getXMLReader(  );
      SAXModelBuilder mb = new SAXModelBuilder(  );
      parser.setContentHandler( mb );
      parser.parse( "zooinventory.xml" );

      Inventory inventory = (Inventory)mb.getModel(  );
      System.out.println("Animals = "+inventory.getAnimals(  ));
      Animal cocoa = (Animal)(inventory.getAnimals(  ).get(1));
      FoodRecipe recipe = cocoa.getFoodRecipe(  );
      System.out.println( "Recipe = "+recipe );

The output should look like this:

Animals = [Song Fang(Giant Panda), Cocoa(Gorilla)]
Recipe = Gorilla Chow: [Fruit, Shoots, Leaves]

In the following sections we'll generate the equivalent output using different tools.

Limitations and possibilities

To make our model builder more complete, we could use more robust naming conventions for our tags and model classes (taking into account packages and mixed capitalization, etc.). But more generally, we might not want to name our model classes strictly based on tag names. And, of course, there is the problem of taking our model and going the other way, using it to generate an XML document. Furthermore, as we've said, writing the model classes is tedious and error-prone. All this is a good indication that this area is ripe for autogeneration of classes. We'll discuss tools that do that a bit later in the chapter.


Java 1.4 introduced a tool for serializing JavaBeans classes to XML. The java.beans package XMLEncoder and XMLDecoder classes are analogous to java.io ObjectInputStream and ObjectOutputStream. Instead of using the native Java serialization format, they store the object state in a high-level XML format. We say that they are analogous, but the XML encoder is not a general replacement for Java object serialization. Instead, it is specialized to work with objects that follow the JavaBeans design patterns, and it can only store and recover state of the object that is expressed through a bean's public properties in this way (using getters and setters).

In memory, the XMLEncoder attempts to construct a copy of the graph of beans that you are serializing, using only public constructors and JavaBean properties. As it works, it writes out these steps as "instructions" in an XML format. Later, the XMLDecoder executes these instructions and produces the result. The primary advantage of this process is that it is highly resilient to changes in the class implementation. While standard Java object serialization can accommodate many kinds of "compatible changes" in classes, it requires some help from the developer to get it right. Because the XMLEncoder uses only public APIs and writes instructions in simple XML, it is expected that this form of serialization will be the most robust way to store the state of JavaBeans. The process is referred to as "long-term persistence" for JavaBeans.

Give it a whirl. You can use the model-builder example to create the beans and compare the output to our original XML. You can add this bit to our TestModelBuilder class, which will populate the beans for you to write:

import java.beans.XMLEncoder;

XMLEncoder xmle = new XMLEncoder( System.out );
xmle.close(  );


Further thoughts

It might seem at first like this would obviate the need for our SAXModelBuilder example. Why not simply write our XML in the format that XMLDecoder understands and use it to build our model? Well, although XMLEncoder is very efficient at eliminating redundancy, you can see that its output is still very verbose (about four times as large as our original XML) and not very human-friendly. Although it's possible to write it by hand, this XML format wasn't really designed for that. Finally, although XMLEncoder can be customized for how it handles specific object types, it suffers from the same problem that our model builder does in that "binding" (the namespace of tags) is determined strictly by our Java class names. As we've said before, what is really needed is a more general tool to generate classes or to map our own classes to XML and back.

Learn DOM in the next installment.

Return to ONJava.com.