ONJava.com    
 Published on ONJava.com (http://www.onjava.com/)
 See this if you're having trouble printing code examples


Parsing an XML Document with XPath Parsing an XML Document with XPath

by Deepak Vohra
01/12/2005

The getter methods in the org.w3c.dom package API are commonly used to parse an XML document. But J2SE 5.0 also provides the javax.xml.xpath package to parse an XML document with the XML Path Language (XPath) . The JDOM org.jdom.xpath.XPath class also has methods to select XML document node(s) with an XPath expression, which consists of a location path of an XML document node or a list of nodes.

Parsing an XML document with an XPath expression is more efficient than the getter methods, because with XPath expressions, an Element node may be selected without iterating over a node list. Node lists retrieved with the getter methods have to be iterated over to retrieve the value of element nodes. For example, the second article node in the journal node in the example XML document in this tutorial (listed in the Overview section below) may be retrieved with the XPath expression:

Element article=(Element)
    (xPath.evaluate("/catalog/journal/article[2]/title",
                    inputSource,
					XPathConstants.NODE));

In the code snippet, xPath is an javax.xml.xpath.XPath class object, and inputSource is an InputSource object for an XML document. With the org.w3c.dom package getter methods, the second article node in the journal node is retrieved with the code snippet:

Document document;
NodeList nodeList=document.getElementsByTagName("journal");
Element journal=(Element)(nodeList.item(0));
NodeList nodeList2=journal.getElementsByTagName("article");
Element article=(Element)nodeList2.item(1);

Also, with an XPath expression, an Attribute node may be selected directly, in comparison to the getter methods, in which an Element node is required to be evaluated before an Attribute node is evaluated. For example, the value of the level attribute for the article node with the date January-2004 is retrieved with an XPath expression:

String level = 
    xPath.evaluate("/catalog/journal/article[@date='January-2004']/@level",
    inputSource);

By comparison, the org.w3c.dom package makes you retrieve the org.w3c.dom.Element object for the article, and then get its level attribute with:

String level=article.getAttribute("level");

Overview

In this tutorial, an example XML document is parsed with J2SE 5.0's XPath class and JDOM's XPath class. XML document nodes are selected with XPath expressions. Depending on the XPath expression evaluated, the nodes selected are either org.w3c.dom.Element nodes or org.w3c.dom.Attribute nodes. The example XML document, catalog.xml, is listed below:

<?xml version="1.0" encoding="UTF-8"?> 
<catalog xmlns:journal="http://www.w3.org/2001/XMLSchema-Instance" > 
  <journal:journal title="XML"  publisher="IBM developerWorks"> 
      <article journal:level="Intermediate"              
            date="February-2003">   
         <title>Design XML Schemas Using UML</title> 
         <author>Ayesha Malik</author>  
      </article>
  </journal:journal> 
  <journal title="Java Technology"  publisher="IBM       
        developerWorks"> 
      <article level="Advanced" date="January-2004">   
          <title>Design service-oriented architecture    
                 frameworks with J2EE technology</title> 
          <author>Naveen Balani </author>  
      </article>
      <article level="Advanced" date="October-2003">   
          <title>Advance DAO Programming</title> 
          <author>Sean Sullivan </author>  
      </article>
      
  </journal> 
</catalog>

The example XML document has a namespace declaration, xmlns:journal="http://www.w3.org/2001/XMLSchema-instance", for elements in the journal prefix namespace.

This article is structured into the following sections:

  1. Preliminary Setup
  2. Parsing with the JDK 5.0 XPath Class
  3. Parsing with the JDOM XPath Class

Preliminary Setup

To use J2SE 5.0's XPath support, the javax.xml.xpath package needs to be in the CLASSPATH. Install the new version of the J2SE 5.0 SDK. To parse an XML document with the JDK 5.0 XPath class, add the <JDK5.0>\jre\lib\rt.jar file to the CLASSPATH variable, if it's not already in the CLASSPATH. <JDK5.0> is the directory in which JDK 5.0 is installed.

The org.apache.xpath.NodeSet class is required in the CLASSPATH. Install Xalan-Java; extract xalan-j-current-bin.jar to a directory. Add <Xalan>/bin/xalan.jar to the CLASSPATH, where <Xalan> is the directory in which Xalan-Java is installed.

To parse an XML document with the JDOM XPath class, the JDOM API classes need to be in the CLASSPATH. Install JDOM; extract the jdom-b9.zip file to an installation directory. Add <JDOM>/jdom-b9/build/jdom.jar, <JDOM>/jdom-b9/lib/saxpath.jar, <JDOM>/jdom-b9/lib/jaxen-core.jar, <JDOM>/jdom-b9/lib/jaxen-jdom.jar, and <JDOM>/jdom-b9/lib/xerces.jar to the CLASSPATH variable, where <JDOM> is the directory in which JDOM is installed.

Parsing with the JDK 5.0 XPath Class

The javax.xml.xpath package in J2SE 5.0 has classes and interfaces to parse an XML document with XPath. Some of the classes and interfaces in JDK 5.0 are listed in the following table:

Class/Interface Description
XPath (interface) Provides access to the XPath evaluation environment. Provides the evaluate methods to evaluate XPath expressions in an XML document.
XPathExpression (interface) Provides the evaluate methods to evaluate compiled XPath expressions in an XML document.
XpathFactory (class) Used to create an XPath object.

In this section, the example XML document is evaluated with the javax.xml.xpath.XPath class. First, import the javax.xml.xpath package.

import javax.xml.xpath.*;

The evaluate methods in the XPath and XPathExpression interfaces are used to parse an XML document with XPath expressions. The XPathFactory class is used to create an XPath object. Create an XPathFactory object with the static newInstance method of the XPathFactory class.

XPathFactory  factory=XPathFactory.newInstance();

Create an XPath object from the XPathFactory object with the newXPath method.

XPath xPath=factory.newXPath();

Create and compile an XPath expression with the compile method of the XPath object. As an example, select the title of the article with its date attribute set to January-2004. An attribute in an XPath expression is specified with an @ symbol. For further reference on XPath expressions, see the XPath specification for examples on creating an XPath expression.

XPathExpression  xPathExpression=
    xPath.compile("/catalog/journal/article[@date='January-2004']/title");

Create an InputSource for the example XML document. An InputSource is a input class for an XML entity. The evaluate method of the XPathExpression interface evaluates either an InputSource or a node/node list of the types org.w3c.dom.Node, org.w3c.dom.NodeList, or org.w3c.dom.Document.

InputSource inputSource = 
    new InputSource(new
         FileInputStream(xmlDocument)));

xmlDocument is the java.io.File object of the example XML document.

File xmlDocument = 
    new File("c:/catalog/catalog.xml");

Evaluate the XPath expression with the InputSource of the example XML document to evaluate over.

String title = 
  xPathExpression.evaluate(inputSource);

The result of the XPath expression evaluation is the title: Design service-oriented architecture frameworks with J2EE technology. The XPath object may be directly evaluated to evaluate the value of an XPath expression in an XML document without first compiling an XPath expression. Create an InputSource.

inputSource = 
    new InputSource(new FileInputStream(xmlDocument)));

As an example, evaluate the value of the publisher node in the journal element.

String publisher = 
    xPath.evaluate("/catalog/journal/@publisher", inputSource);

The result of the XPath object evaluation is the attribute value: IBM developerWorks. The evaluate method in the XPath class may also be used to evaluate a node set. For example, select the node or set of nodes that correspond to the article element nodes in the XML document. Create the XPath expression that represents a node set.

String expression="/catalog/journal/article";

Select the node set of article element nodes in the example XML document with the evaluate method of the XPath object.

NodeSet nodes = 
    (NodeSet) xPath.evaluate(expression,
        inputSource, XPathConstants.NODESET);

XpathConstants.NODESET specifies the return type of the evaluate method as a NodeSet. The return type may also be set to NODE, STRING, BOOLEAN or NUMBER. The NodeSet class implements the NodeList interface. To parse the nodes in the node set, cast the NodeSet object to NodeList.

NodeList nodeList=(NodeList)nodes;

Thus, nodes in an XML document get selected and evaluated without iterating over the getter methods of the org.w3c.dom API. The example program XPathEvaluator.java is used to parse an XML document with the JDK 5.0 XPath class.

Parsing with the JDOM XPath Class

The JDOM API XPath class supports XPath expression to select nodes from an XML document. Some of the methods in the JDOM XPath class are illustrated in the following table:

XPath Class Method Description
selectSingleNode Used to select a single node that matches an XPath expression.
selectNodes Used to select a list of nodes that match an XPath expression.
addNamespace Used to add a namespace to match an XPath expression with namespace prefixes.

In this section, the procedure to select nodes from the example XML document catalog.xml with the JDOM XPath class shall be discussed. The node/nodes selected by the select methods are modified, and the modified document is output to an XML document. First, import the JDOM org.jdom.xpath package classes.

import org.jdom.xpath.*; 

Create a SAXBuilder.

SAXBuilder saxBuilder = 
    new SAXBuilder("org.apache.xerces.parsers.SAXParser");

Parse the XML document catalog.xml with the SAXBuilder.

org.jdom.Document jdomDocument =
    saxBuilder.build(xmlDocument);

xmlDocument is the java.io.File representation of the XML document catalog.xml. The static method selectSingleNode(java.lang.Object context, String XPathExpression) selects a single node specified by an XPath expression. If more than one nodes match the XPath expression, the first node that matches the XPath expression gets selected. Select the attribute node level of an element article in a journal with title set to Java Technology, and with article attribute date set to January-2004, with an XPath expression.

org.jdom.Attribute levelNode = 
    (org.jdom.Attribute)(XPath.selectSingleNode(
        jdomDocument,
        "/catalog//journal[@title='JavaTechnology']" +
        "//article[@date='January-2004']/@level"));

The level attribute value Advanced gets selected. Modify the level node.

levelNode.setValue("Intermediate");

The selectSingleNode method may also be used to select an element node in an XML document. As an example, select a title node. Select the title node with an XPath expression.

org.jdom.Element titleNode = 
    (org.jdom.Element) XPath.selectSingleNode( jdomDocument,
    "/catalog//journal//article[@date='January-2004']/title");

The title node with value Design service-oriented architecture frameworks with J2EE technology gets selected. Modify the title node.

titleNode.setText(
    "Service Oriented Architecture Frameworks");

The static method selectNodes(java.lang.Object context, String XPathExpression) selects all of the nodes specified by an XPath expression. Select all of the article nodes for the journal with a title set to Java Technology.

java.util.List nodeList =
    XPath.selectNodes(jdomDocument, 
    "/catalog//journal[@title='Java Technology']//article");

Modify the article nodes. Add an attribute to the article nodes.

Iterator iter=nodeList.iterator();
while(iter.hasNext()) {
    org.jdom.Element element = 
        (org.jdom.Element) iter.next();
    element.setAttribute("section", "Java Technology");
}

The JDOM XPath class supports selection of nodes with namespace prefixes. To select a node with a namespace, add a namespace to an XPath:

XPath xpath = 
   XPath.newInstance( 
    "/catalog//journal:journal//article/@journal:level");
xpath.addNamespace("journal", 
    "http://www.w3.org/2001/XMLSchema-Instance"
);

A namespace with the prefix journal gets added to the XPath object. Select a node with a namespace prefix:

levelNode = (org.jdom.Attribute)
    xpath.selectSingleNode(jdomDocument);

The attribute node journal:level gets selected. Modify the journal:level node.

levelNode.setValue("Advanced");

The Java program JDomParser.java is used to select nodes from the catalog.xml XML document. In this section, the procedure to select nodes from an XML document with the JDOM XPath class select methods was explained. The nodes selected are modified. The modified document is output to a XML document with the XMLOutputter class. catalog-modified.xml is the output XML document.

Conclusion

In this tutorial, an XML document was parsed with XPath. XPath is used only to select nodes. XPath APIs discussed in this tutorial do not have the provision to set values for XML document nodes with XPath. To set values for nodes, the setter methods of the org.w3c.dom package are required.

Resources

Deepak Vohra is a NuBean consultant and a web developer.


Return to ONJava.com

Copyright © 2009 O'Reilly Media, Inc.