ONJava.com -- The Independent Source for Enterprise Java
oreilly.comSafari Books Online.Conferences.

advertisement

AddThis Social Bookmark Button O'Reilly Book Excerpts: Learning Java, 2nd Edition

XML Basics for Java Developers, Part 3

by Patrick Niemeyer and Jonathan Knudsen

In part three in this series of book excerpts on XML basics for Java developers from Learning Java, 2nd Edition, learn about the Document Object Model (DOM).

DOM

Related Reading

Learning Java
By Patrick Niemeyer, Jonathan Knudsen

In the last section, we used SAX to parse an XML document and build a Java object model representing it. In that case, we created specific Java types for each of our complex elements. If we were planning to use our model extensively in an application, this technique would give us a great deal of flexibility. But often it is sufficient (and much easier) to use a "generic" model that simply represents the content of the XML in a neutral form. The Document Object Model (DOM) is just that. The DOM API parses an XML document into a full, memory-resident representation consisting of classes such as Element and Attributes with text values.

As we saw in our zoo example, once you have an object model, using the data is a breeze. So a generic DOM would seem like an appealing solution, especially when working mainly with text. The only catch in this case is that DOM didn't evolve first as a Java API, and it doesn't map well to Java. DOM is very complete and provides access to every facet of the original XML document, but it's so generic (and language-neutral), it's cumbersome to use in Java. In our example, we'll start by making a couple of helper methods to smooth things over. Later, we'll also mention a native Java alternative to DOM called JDOM that is more pleasant to use.

The DOM API

The core DOM classes belong to the org.w3c.dom package. The result of parsing an XML document with DOM is a Document object from this package (see Figure 23-1). The Document is a factory and a container for a hierarchical collection of Node objects, representing the document structure. A node has a parent and may have children, which can be traversed using its getChildNodes(), getFirstChild(), or getLastChild() methods. A node may also have "attributes" associated with it, which consist of a named map of nodes.

Figure 23-1. The parsed DOM

Subtypes of Node--Element, Text, and Attr--represent elements, text, and attributes in XML. Some types of nodes (including these) have a text "value." For example, the value of a Text node is the text of the element it represents. The same is true of an attribute, cdata, or comment node. The value of a node can be accessed by the getNodeValue() and setNodeValue() methods.

In This Series

XML Basics for Java Developers, Part 5
In this final in a series of XML basics for Java developers book excerpts from Learning Java, 2nd Edition, get an introduction to XSL/XSLT and Web services.

XML Basics for Java Developers, Part 4
In part four in a series of XML basics for Java developers book excerpts from Learning Java, 2nd Edition, learn about validating documents.

XML Basics for Java Developers, Part 2
In this second part in a several part series on XML for Java developers from Learning Java, 2nd Edition, learn about SAX and the SAX API.

XML Basics for Java Developers, Part 1
This is the first in a series of book excerpts on XML for Java developers from Learning Java, 2nd Edition. This excerpt covers XML fundamentals.

The Element node provides "random" access to its child elements through its getElementsByTagName() method, which returns a NodeList (a simple collection type). You can also fetch an attribute by name from the Element using the getAttribute() method.

The javax.xml.parsers package contains a factory for DOM parsers, just as it does for SAX parsers. An instance of DocumentBuilderFactory can be used to create a DocumentBuilder object to parse the file and produce a Document result.

Test-Driving DOM

Let's use DOM to parse our zoo inventory and print the same information as our model-builder example. Using DOM saves us from having to create all those model classes and makes our example much shorter. But before we even begin, we're going to make a couple of utility methods to save us a great deal of pain. The following class, DOMUtil, covers two very common operations on an element: retrieving a simple (singular) child element by name and retrieving the text of a simple child element by name. Here is the code:

import org.w3c.dom.*;

public class DOMUtil
{
   public static Element getFirstElement( Element element, String name ) {
      NodeList nl = element.getElementsByTagName( name );
      if ( nl.getLength() < 1 )
         throw new RuntimeException(
            "Element: "+element+" does not contain: "+name);
      return (Element)nl.item(0);
   }

   public static String getSimpleElementText( Element node, String name ) 
   {
      Element namedElement = getFirstElement( node, name );
      return getSimpleElementText( namedElement );
   }

   public static String getSimpleElementText( Element node ) 
   {
      StringBuffer sb = new StringBuffer();
      NodeList children = node.getChildNodes();
      for(int i=0; i<children.getLength(); i++) {
         Node child = children.item(i);
         if ( child instanceof Text )
            sb.append( child.getNodeValue() );
      }
      return sb.toString();
   }
}

Pages: 1, 2

Next Pagearrow