XML is now used frequently to model business data in large-scale applications. A common Java application task is to parse XML to retrieve its data. The Document Object Model (DOM) defines a set of interfaces for navigating and manipulating the content and structure of XML and HTML documents.
After reading this article you will be able to create a representation of an XML document in your Java program and traverse that representation in two different ways. You will be able to traverse a horizontal representation of the XML document, and you will be able to traverse a tree, or hierarchical, representation of the XML document.
The DOM defines interfaces that allow programmers to navigate XML and HTML documents and also to manipulate their content and structure. The DOM is a specification of interfaces; it's not an implementation. Vendors are left to come up with their own implementation of DOM. Sun Microsystems has some DOM support in its Java XML Processing API. Other vendors that provide support are IBM, Oracle, and the Apache Software Foundation.
The DOM has several levels. Level 0 was the first requirements document that defined functionality similar to that used in Netscape Navigator 3.0 and Internet Explorer 3.0. On October 1, 1998, DOM Level 1 Recommendation was released which provided functionality to navigate and manipulate the structure and content of XML and HTML documents. DOM Level 2 is a set of specifications that add to the functionality defined in DOM Level 1. The following list describes the different recommendations of DOM Level 2.
DOM Level 2 Core Recommendation -- Further defines ways to navigate and manipulate the content and structure of XML and HTML documents. This recommendation is built on the DOM Level 1 Recommendation, filling in some of the gaps.
DOM Level 2 Views Recommendation specifies an interface to provide programmers the functionality to view alternate presentations of the XML or HTML document. The interfaces defined in this recommendation are optional, but if a vendor chooses to implement them, they must also implement the DOM Level 2 Core Recommendation.
DOM Level 2 Style Recommendation specifies an interface to provide programmers the ability to dynamically access and manipulate style sheets. The interfaces defined in this recommendation are optional, but if a vendor chooses to implement the interfaces, they must also implement the DOM Level 2 Core Recommendation.
DOM Level 2 Events Recommendation specifies a set of interfaces to provide a generic event system to programmers. The interfaces defined in this recommendation are optional, but if a vendor chooses to implement them, they must also provide support for the DOM Level 2 Core Recommendation.
DOM Level 2 Traversal-Range Recommendation specifies a set of interfaces that allow programmers to traverse a representation of the XML document. There is also a set of interfaces defined to manipulate ranges of the XML document. The interfaces defined in this recommendation are optional, but if a vendor chooses to implement them, they must also implement the DOM Level 2 Core Recommendation.
DOM Level 2 HTML Working Draft provides a set of interfaces that allow programmers work on HTML documents. The interfaces defined in this recommendation are optional, but if a vendor chooses to implement them, they must also provide implementation for the DOM Level 2 Core Recommendation.
This article explores the traversal of DOM representations of XML
documents from within Java applications. The Apache Software
Foundation has implemented the optional interfaces of the DOM Level 2
Traversal-Range Recommendation in their Xerces project. You can
download the
Xerces JAR, which contains the files you'll need. Xerces also
supports the optional interfaces defined in DOM Level 2 Events
Recommendation. Make sure to place the xerces.jar file in
your system CLASSPATH so the Java compiler will be able
to locate the appropriate files.
You also need a JDK to compile and run your Java programs. You can download a JDK from Sun Microsystems.
To learn to parse and traverse XML you'll need an example XML document. Listing 1 is the DTD for the example XML document. It's a simple representation of a bank. In this example a bank has clients, employees, and a branch identification number.
Listing 1: bank.dtd -- A DTD that defines the
parts of a bank:
<!ELEMENT bank (client+, employee+, branchID)>
<!ELEMENT client (clientName, homeAddress, homePhone, account+)>
<!ELEMENT branchID (#PCDATA)>
<!ELEMENT clientName (#PCDATA)>
<!ELEMENT homeAddress (#PCDATA)>
<!ELEMENT homePhone (#PCDATA)>
<!ELEMENT account (type, accountNumber)>
<!ELEMENT type (#PCDATA)>
<!ELEMENT accountNumber (#PCDATA)>
<!ELEMENT employee (empID, empName, workAddress, workPhone, salary)>
<!ELEMENT empID (#PCDATA)>
<!ELEMENT empName (#PCDATA)>
<!ELEMENT workAddress (#PCDATA)>
<!ELEMENT workPhone (#PCDATA)>
<!ELEMENT salary (#PCDATA)>
<!ELEMENT branchID (#PCDATA)>
Listing 2, an instance of our DTD, represents a bank with two clients and two employees.
Listing 2: bank.xml -- A simple XML document representing a view of a bank.
<?xml version="1.0"?>
<!DOCTYPE bank SYSTEM "bank.dtd" >
<bank>
<client>
<clientName>Bill Clinton</clientName>
<homeAddress>Nashua, NH</homeAddress>
<homePhone>555/555-8975</homePhone>
<account>
<type>Checking</type>
<accountNumber>111222333</accountNumber>
</account>
<account>
<type>Savings</type>
<accountNumber>777888999</accountNumber>
</account>
</client>
<client>
<clientName>Al Gore</clientName>
<homeAddress>Washington, DC</homeAddress>
<homePhone>555/555-4256</homePhone>
<account>
<type>Savings</type>
<accountNumber>444777888</accountNumber>
</account>
</client>
<employee>
<empID>2105</empID>
<empName>Ronald Reagan</empName>
<workAddress>Nashua, NH</workAddress>
<workPhone>555/555-1245</workPhone>
<salary>60000</salary>
</employee>
<employee>
<empID>77</empID>
<empName>Jimmy Carter</empName>
<workAddress>Denver, CO</workAddress>
<workPhone>555/555-1235</workPhone>
<salary>250000</salary>
</employee>
<branchID>78963</branchID>
</bank>
|
All of the following code uses the Apache Software Foundation's implementation of DOM Level 2 Traversal-Range Recommendation, part of the Xerces project that provides implementation of
The Traversal-Range Recommendation defines two interfaces for
traversing XML elements. The NodeIterator interface
declares methods for traversing a flat representation of an XML
document; and the TreeWalker interface declares methods
that allow programmers to traverse an XML document as if it were a
tree structure. Another interface defined in the Traversal-Range
Recommendation is NodeFilter which determines what nodes
should be included in the logical view of the document. It filters the
elements of the XML document.
The create methods for NodeIterator and
TreeWalker have flags that allow the programmer to choose
what elements to include in the logical view of the document. Any node
that is invisible is skipped over as if it does not exist in the
document.
Writing a Filter
Filters are used to determine what
elements should be incorporated into the logical view of the
document. The following is the interface of NodeFilter.
// Introduced in DOM Level 2:
interface NodeFilter {
// Constants returned by acceptNode
const short FILTER_ACCEPT = 1;
const short FILTER_REJECT = 2;
const short FILTER_SKIP = 3;
// Constants for whatToShow
const unsigned long SHOW_ALL = 0xFFFFFFFF;
const unsigned long SHOW_ELEMENT = 0x00000001;
const unsigned long SHOW_ATTRIBUTE = 0x00000002;
const unsigned long SHOW_TEXT = 0x00000004;
const unsigned long SHOW_CDATA_SECTION = 0x00000008;
const unsigned long SHOW_ENTITY_REFERENCE = 0x00000010;
const unsigned long SHOW_ENTITY = 0x00000020;
const unsigned long SHOW_PROCESSING_INSTRUCTION = 0x00000040;
const unsigned long SHOW_COMMENT = 0x00000080;
const unsigned long SHOW_DOCUMENT = 0x00000100;
const unsigned long SHOW_DOCUMENT_TYPE = 0x00000200;
const unsigned long SHOW_DOCUMENT_FRAGMENT = 0x00000400;
const unsigned long SHOW_NOTATION = 0x00000800;
short acceptNode(in Node n);
};
The NodeFilter interface contains one method,
acceptNode(), that determines if a node should be
accepted or rejected. The acceptNode() method may return
one of three values:
FILTER_ACCEPT -- The current node is included
into the logical view of the document.FILTER_SKIP -- The current node is not accepted, but
the children of the current node are considered for acceptance.FILTER_REJECT -- The current node is not accepted and
the children of the current node are not considered for
inclusion.The constants defined in the NodeFilter for the
whatToShow will be used in the create methods of the
NodeIterator and the TreeWalker. We will see
this shortly. But first let's write a filter. To do so you simply
write a class that implements the NodeFilter interface
and overrides acceptNode(). Listing 3 is a simple filter
that accepts all element nodes.
Listing 3: A filter that accepts all element nodes and skips others.
class AllElements implements NodeFilter
{
public short acceptNode (Node n)
{
if (n.getNodeType() == Node.ELEMENT_NODE)
return FILTER_ACCEPT;
return FILTER_SKIP;
}
}
The Node interface defines the following fields to
help determine the type of node you're dealing with.
ATTRIBUTE_NODE | The node is an Attr. |
CDATA_SECTION_NODE | The node is a CDATASection |
COMMENT_NODE |
The node is a Comment. |
DOCUMENT_FRAGMENT_NODE | The node is a DocumentFragment |
DOCUMENT_NODE |
The node is a Document. |
DOCUMENT_TYPE_NODE | The node is a DocumentType. |
ELEMENT_NODE |
The node is an Element. |
ENTITY_NODE |
The node is an Entity. |
ENTITY_REFERENCE_NODE | The node is an EntityReference. |
NOTATION_NODE |
The node is a Notation. |
PROCESSING_INSTRUCTION_NODE | The node is a ProcessingInstruction. |
TEXT_NODE | The node is a Text node. |
You can use a switch statement in your filter to determine what nodes to accept, skip, or reject. Listing 4 shows an example of a filter that uses a switch statement.
Listing 4: MyFilter.java -- A filter that uses a switch statement to determine what nodes to accept, skip, or reject.
class MyFilter implements NodeFilter
{
public short acceptNode(Node n)
{
short s = n.getNodeType();
switch (s) {
case Node.ATTRIBUTE_NODE:
return FILTER_REJECT;
case Node.CDATA_SECTION_NODE:
return FILTER_SKIP;
case Node.COMMENT_NODE:
return FILTER_ACCEPT;
case Node.ELEMENT_NODE:
return FILTER_ACCEPT;
default:
return FILTER_SKIP;
}
}
}
The NodeIterator interface
The
NodeInterator interface provides methods to traverse a
flat, or horizontal, representation of an XML document. For example,
take a very simple XML document:
<A>
<B>text1</B>
<C>
<D>child of C</D>
<E>another child of C</E>
</C>
<F>moreText</F>
</A>
A flat representation of this simple XML document is
A B C D E F
When you have a flattened XML representation, you traverse it by asking for the next node and the previous node.
What is the horizontal version of the bank.xml document?
bank client clientName homeAddress homePhone account type
accountNumber account type accountNumber client clientName homeAddress
homePhone account type accountNumber employee empID empName workAddress
workPhone salary employee empID workAddress workPhone salary
branchID
The traverse an XML document you need to create an object that
implements the Document interface. The Apache Software
Foundation created an implementation class for the
Document interface, DocumentImpl. In order
to get an object of type DocumentImpl, you need to get a
DOMParser and then parse the XML file you want to
traverse. The DOMParser has a method,
getDocument(), that you can use to get an object of
type DocumentImpl. The following code creates the
DocumentImpl object:
DOMParser parser = new DOMParser();
parser.parse("bank.xml"); DocumentImpl document =
(DocumentImpl)parser.getDocument();
Now that you have an object of type DocumentImpl you
can create the implementation class for the NodeIterator,
NodeIteratorImpl. The DocumentImpl object
has a createNodeIterator() method that has the following
signature:
NodeIterator createNodeIterator(rootNode, whatToShow, filter, boolean);
The rootNode is the at which you want to begin
traversing. whatToShow is the option of the
NodeFilter interface; what elements you want to include
in the logical view of the XML document. The last parameter is a
boolean which determines if you want the entity nodes to
be expanded.
You can cast the return from this create method to retrieve a
NodeIteratorImpl object as the following code shows.
NodeIteratorImpl iterator =
(NodeIteratorImpl) document.createNodeIterator(root,
NodeFilter.SHOW_ELEMENT, (NodeFilter)allelements, true);
The NodeInterfaceImpl class provides the following
methods to traverse the XML:
Node nextNode() raises(DOMException);
Node previousNode() raises(DOMException);
void detach();
Listing 5 shows a Java program that horizontally traverses an XML document.
Listing 5: NodeIterator.java - A Java application that horizontally traverses an XML document, returning the element nodes.
import org.w3c.dom.Node;
import org.apache.xerces.parsers.DOMParser;
import org.apache.xerces.dom.traversal.NodeIteratorImpl;
import org.apache.xerces.dom.DocumentImpl;
import org.apache.xerces.domx.traversal.NodeFilter;
public class NodeIteratorClient
{
public static void main(String args[])
{
if ((args == null) || (args.length < 1))
{
System.out.println("Usage: java demo <XML_filename>");
System.exit(0);
}
try
{
//create an object of the Document implementation class
DOMParser parser = new DOMParser();
parser.parse(args[0]);
DocumentImpl document = (DocumentImpl)parser.getDocument();
//get the root of the XML document
Node root = document.getLastChild();
//instantiate a filter
AllElements allelements = new AllElements();
//create an object of the NodeIterator implementation class
NodeIteratorImpl iterator =
(NodeIteratorImpl)document.createNodeIterator(root,
NodeFilter.SHOW_ELEMENT, (NodeFilter)allelements, true);
//recursively print all elements of the XML document
printElements(iterator);
}
catch (Exception e)
{
System.out.println("error: " + e);
e.printStackTrace();
System.exit(0);
}
}
//recursive function that prints all elements of the XML document
public static void printElements(NodeIteratorImpl iter)
{
Node n;
while ((n = iter.nextNode()) != null)
{
System.out.println(n.getNodeName());
}
}
}
//filters elements in the XML document; Returns all the Element nodes
class AllElements implements NodeFilter
{
public short acceptNode (Node n)
{
if (n.getNodeType() == Node.ELEMENT_NODE)
return FILTER_ACCEPT;
return FILTER_SKIP;
}
}
After running this program on the bank.xml file, the
output looks as expected.
|
The TreeWalker interface
You've now seen how
to write filters and traverse XML documents horizontally. An XML
document can also be represented as a tree structure, and the
TreeWalker interface declares methods that allow you to
traverse this tree structure. Take that simple XML document
again:
<A>
<B>text1</B>
<C>
<D>child of C</D>
<E>another child of C</E>
</C>
<F>moreText</F>
</A>
The tree representation of this simple XML document is
In the tree structure you have the concept of parent nodes and children
nodes. Therefore, the TreeWalker interface provides the
following methods to traverse the tree structure:
Node parentNode();
Node firstChild();
Node lastChild();
Node previousSibling();
Node nextSibling();
Node previousNode();
Node nextNode();
The create method for the TreeWalker is very similar
to the create method for NodeIterator; the only
differences is the name and the return value. The signature for
TreeWalker's create method is
TreeWalker createTreeWalker(root, whatToShow, filter, boolean);
The return from this method can be cast into
TreeWalkerImpl. You can then use the traversal methods to
walk he XML document. Listing 6 shows a Java application using the
TreeWalker implementation class to do just that.
Listing 6: TreeWalkerClient.java - A Java application that is traversing the tree representation of the XML document.
import org.w3c.dom.Node;
import org.apache.xerces.parsers.DOMParser;
import org.apache.xerces.dom.traversal.TreeWalkerImpl;
import org.apache.xerces.domx.traversal.NodeFilter;
import org.apache.xerces.dom.DocumentImpl;
public class TreeWalkerClient
{
public static void main(String args[])
{
if ((args == null) || (args.length < 1))
{
System.out.println("Usage: java demo <filename>");
System.exit(0);
}
try
{
//create an object of the Document implementation class
DOMParser parser = new DOMParser();
parser.parse(args[0]);
DocumentImpl document = (DocumentImpl)parser.getDocument();
//get the root of the XML document
Node root = document.getLastChild();
//instantiate the filter object
AllElements allelements = new AllElements();
//create an object of the TreeWalker implementation class
TreeWalkerImpl tw =
(TreeWalkerImpl)document.createTreeWalker(root,
NodeFilter.SHOW_ELEMENT, (NodeFilter)allelements, true);
//print the elements of the TreeWalker implementation class
printElements(tw);
}
catch (Exception e)
{
System.out.println("error: " + e);
e.printStackTrace();
System.exit(0);
}
}
//traverses the tree structure representation
public static void printElements(TreeWalkerImpl tw)
{
Node n = tw.getCurrentNode();
System.out.println(n.getNodeName());
for (Node child=tw.firstChild();child!=null;child=tw.nextSibling())
{
printElements(tw);
}
tw.setCurrentNode(n);
}
}
//filters the elements of the XML document
class AllElements implements NodeFilter
{
public short acceptNode (Node n)
{
if (n.getNodeType() == Node.ELEMENT_NODE)
return FILTER_ACCEPT;
return FILTER_SKIP;
}
}
After running this program with the bank.xml file, the
output is
Notice the output is the same as the NodeIteratorClient Java
application. The only difference between these programs was how the XML
document was traversed.
The Document Object Model (DOM) can be used within Java applications to traverse XML documents. The DOM only specifies the interfaces that can allow navigation and manipulation of XML documents, it's left to vendors to supply implementations that do the work. This article focused on Apache Software Foundation's implementation of the DOM Level 2 Traversal-Range Recommendation.
XML documents can be represented in one of two fashions: as a flat
structure or as a tree structure. The NodeIterator
interface declares the methods that allow programmers to traverse a
flat representation of an XML document. The TreeWalker
interface declares methods that allow programmers to traverse a tree
representation of an XML document. The NodeFilter
interface allows programmers to create filters that select which items
from the XML document are included in the logical view of the
application.
Stephanie Fesler is a BEA Systems expert on implementing various Java 2EE API.
Return to ONJava.com.
Copyright © 2007 O'Reilly Media, Inc.