Using DOM to Traverse XML
Pages: 1, 2, 3
Traversing the Elements of the XML Document
All of the following code uses the Apache Software Foundation's implementation of DOM Level 2 Traversal-Range Recommendation, part of the Xerces project that provides implementation of
- DOM Level 1 Core Recommendation,
- DOM Level 2 Core Recommendation,
- DOM Level 2 Events Recommendation, and
- DOM Level 2 Traversal-Range Recommendation.
The Traversal-Range Recommendation defines two interfaces for
traversing XML elements. The NodeIterator interface
declares methods for traversing a flat representation of an XML
document; and the TreeWalker interface declares methods
that allow programmers to traverse an XML document as if it were a
tree structure. Another interface defined in the Traversal-Range
Recommendation is NodeFilter which determines what nodes
should be included in the logical view of the document. It filters the
elements of the XML document.
The create methods for NodeIterator and
TreeWalker have flags that allow the programmer to choose
what elements to include in the logical view of the document. Any node
that is invisible is skipped over as if it does not exist in the
document.
Writing a Filter
Filters are used to determine what
elements should be incorporated into the logical view of the
document. The following is the interface of NodeFilter.
// Introduced in DOM Level 2:
interface NodeFilter {
// Constants returned by acceptNode
const short FILTER_ACCEPT = 1;
const short FILTER_REJECT = 2;
const short FILTER_SKIP = 3;
// Constants for whatToShow
const unsigned long SHOW_ALL = 0xFFFFFFFF;
const unsigned long SHOW_ELEMENT = 0x00000001;
const unsigned long SHOW_ATTRIBUTE = 0x00000002;
const unsigned long SHOW_TEXT = 0x00000004;
const unsigned long SHOW_CDATA_SECTION = 0x00000008;
const unsigned long SHOW_ENTITY_REFERENCE = 0x00000010;
const unsigned long SHOW_ENTITY = 0x00000020;
const unsigned long SHOW_PROCESSING_INSTRUCTION = 0x00000040;
const unsigned long SHOW_COMMENT = 0x00000080;
const unsigned long SHOW_DOCUMENT = 0x00000100;
const unsigned long SHOW_DOCUMENT_TYPE = 0x00000200;
const unsigned long SHOW_DOCUMENT_FRAGMENT = 0x00000400;
const unsigned long SHOW_NOTATION = 0x00000800;
short acceptNode(in Node n);
};
The NodeFilter interface contains one method,
acceptNode(), that determines if a node should be
accepted or rejected. The acceptNode() method may return
one of three values:
FILTER_ACCEPT-- The current node is included into the logical view of the document.FILTER_SKIP-- The current node is not accepted, but the children of the current node are considered for acceptance.FILTER_REJECT-- The current node is not accepted and the children of the current node are not considered for inclusion.
The constants defined in the NodeFilter for the
whatToShow will be used in the create methods of the
NodeIterator and the TreeWalker. We will see
this shortly. But first let's write a filter. To do so you simply
write a class that implements the NodeFilter interface
and overrides acceptNode(). Listing 3 is a simple filter
that accepts all element nodes.
Listing 3: A filter that accepts all element nodes and skips others.
class AllElements implements NodeFilter
{
public short acceptNode (Node n)
{
if (n.getNodeType() == Node.ELEMENT_NODE)
return FILTER_ACCEPT;
return FILTER_SKIP;
}
}
The Node interface defines the following fields to
help determine the type of node you're dealing with.
ATTRIBUTE_NODE | The node is an Attr. |
CDATA_SECTION_NODE | The node is a CDATASection |
COMMENT_NODE |
The node is a Comment. |
DOCUMENT_FRAGMENT_NODE | The node is a DocumentFragment |
DOCUMENT_NODE |
The node is a Document. |
DOCUMENT_TYPE_NODE | The node is a DocumentType. |
ELEMENT_NODE |
The node is an Element. |
ENTITY_NODE |
The node is an Entity. |
ENTITY_REFERENCE_NODE | The node is an EntityReference. |
NOTATION_NODE |
The node is a Notation. |
PROCESSING_INSTRUCTION_NODE | The node is a ProcessingInstruction. |
TEXT_NODE | The node is a Text node. |
You can use a switch statement in your filter to determine what nodes to accept, skip, or reject. Listing 4 shows an example of a filter that uses a switch statement.
Listing 4: MyFilter.java -- A filter that uses a switch statement to determine what nodes to accept, skip, or reject.
class MyFilter implements NodeFilter
{
public short acceptNode(Node n)
{
short s = n.getNodeType();
switch (s) {
case Node.ATTRIBUTE_NODE:
return FILTER_REJECT;
case Node.CDATA_SECTION_NODE:
return FILTER_SKIP;
case Node.COMMENT_NODE:
return FILTER_ACCEPT;
case Node.ELEMENT_NODE:
return FILTER_ACCEPT;
default:
return FILTER_SKIP;
}
}
}
The NodeIterator interface
The
NodeInterator interface provides methods to traverse a
flat, or horizontal, representation of an XML document. For example,
take a very simple XML document:
<A>
<B>text1</B>
<C>
<D>child of C</D>
<E>another child of C</E>
</C>
<F>moreText</F>
</A>
A flat representation of this simple XML document is
A B C D E F
When you have a flattened XML representation, you traverse it by asking for the next node and the previous node.
What is the horizontal version of the bank.xml document?
bank client clientName homeAddress homePhone account type
accountNumber account type accountNumber client clientName homeAddress
homePhone account type accountNumber employee empID empName workAddress
workPhone salary employee empID workAddress workPhone salary
branchID
The traverse an XML document you need to create an object that
implements the Document interface. The Apache Software
Foundation created an implementation class for the
Document interface, DocumentImpl. In order
to get an object of type DocumentImpl, you need to get a
DOMParser and then parse the XML file you want to
traverse. The DOMParser has a method,
getDocument(), that you can use to get an object of
type DocumentImpl. The following code creates the
DocumentImpl object:
DOMParser parser = new DOMParser();
parser.parse("bank.xml"); DocumentImpl document =
(DocumentImpl)parser.getDocument();
Now that you have an object of type DocumentImpl you
can create the implementation class for the NodeIterator,
NodeIteratorImpl. The DocumentImpl object
has a createNodeIterator() method that has the following
signature:
NodeIterator createNodeIterator(rootNode, whatToShow, filter, boolean);
The rootNode is the at which you want to begin
traversing. whatToShow is the option of the
NodeFilter interface; what elements you want to include
in the logical view of the XML document. The last parameter is a
boolean which determines if you want the entity nodes to
be expanded.
You can cast the return from this create method to retrieve a
NodeIteratorImpl object as the following code shows.
NodeIteratorImpl iterator =
(NodeIteratorImpl) document.createNodeIterator(root,
NodeFilter.SHOW_ELEMENT, (NodeFilter)allelements, true);
The NodeInterfaceImpl class provides the following
methods to traverse the XML:
Node nextNode() raises(DOMException);
Node previousNode() raises(DOMException);
void detach();
Listing 5 shows a Java program that horizontally traverses an XML document.
Listing 5: NodeIterator.java - A Java application that horizontally traverses an XML document, returning the element nodes.
import org.w3c.dom.Node;
import org.apache.xerces.parsers.DOMParser;
import org.apache.xerces.dom.traversal.NodeIteratorImpl;
import org.apache.xerces.dom.DocumentImpl;
import org.apache.xerces.domx.traversal.NodeFilter;
public class NodeIteratorClient
{
public static void main(String args[])
{
if ((args == null) || (args.length < 1))
{
System.out.println("Usage: java demo <XML_filename>");
System.exit(0);
}
try
{
//create an object of the Document implementation class
DOMParser parser = new DOMParser();
parser.parse(args[0]);
DocumentImpl document = (DocumentImpl)parser.getDocument();
//get the root of the XML document
Node root = document.getLastChild();
//instantiate a filter
AllElements allelements = new AllElements();
//create an object of the NodeIterator implementation class
NodeIteratorImpl iterator =
(NodeIteratorImpl)document.createNodeIterator(root,
NodeFilter.SHOW_ELEMENT, (NodeFilter)allelements, true);
//recursively print all elements of the XML document
printElements(iterator);
}
catch (Exception e)
{
System.out.println("error: " + e);
e.printStackTrace();
System.exit(0);
}
}
//recursive function that prints all elements of the XML document
public static void printElements(NodeIteratorImpl iter)
{
Node n;
while ((n = iter.nextNode()) != null)
{
System.out.println(n.getNodeName());
}
}
}
//filters elements in the XML document; Returns all the Element nodes
class AllElements implements NodeFilter
{
public short acceptNode (Node n)
{
if (n.getNodeType() == Node.ELEMENT_NODE)
return FILTER_ACCEPT;
return FILTER_SKIP;
}
}
After running this program on the bank.xml file, the
output looks as expected.