ONJava.com    
 Published on ONJava.com (http://www.onjava.com/)
 See this if you're having trouble printing code examples


Schemaless Java-XML Data Binding with VTD-XML

by Jimmy Zhang
09/10/2007

Summary

This article introduces a new Java-XML data binding technique based entirely on VTD-XML and XPath. The new approach differs from traditional Java-XML data binding tools in that it doesn't mandate schema, takes advantage of XML's inherent loose encoding, and avoids needless object creation, resulting in much greater efficiency.

Limitations of Schema-based XML Data Binding

XML data binding APIs are a class of XML processing tools that automatically map XML data into custom, strongly typed objects or data structures, relieving XML developers of the drudgery of DOM or SAX parsing. In order for traditional, static XML data binding tools (e.g., JAXB, Castor, and XMLbeans) to work, developers assume the availability the XML schema (or its equivalence) of the document. In the first step, most XML data binders compile XML schemas into a set of class files, which the calling applications then include to perform the corresponding "unmarshalling."

However, developers dealing with XML documents don't always have their schemas on hand.  And even when the XML schemas are available, slight changes to them (often due to evolving business requirements) require class files to be generated anew. Also, XML data binding is most effective when processing shallow, regular-shaped XML data. When the underlying structure of XML documents is complex, users still need to manually navigate the typed hierarchical trees, a task which can require significant coding.

Most limitations of XML data binding come from its rigid dependency on XML schema. Unlike many binary data formats, XML is intended primarily as a schemaless data format flexible enough to represent virtually any kind of information. For advanced uses, XML also is extensible: applications may use only the portion of the XML document that they need. Because of  XML's extensibility, Web Services, and SOA applications are far less likely to break in the face of changes.

The schemaless nature of XML has subtle performance implications in XML data binding. In many cases, only a small subset in an XML document (as opposed to the whole data set) is necessary to drive the application logic. Yet, the traditional approach indiscriminately converts entire data sets into objects, producing unnecessary memory and processing overhead.

Binding XML with VTD-XML and XPath

Motivation

While the concept of XML data binding has essentially remained unchanged since the early days of XML,  the landscape of XML processing has evolved considerably.  The primary purpose of XML data binding APIs is to map XML to objects and the presence of XML schemas merely helps lighten the coding effort of XML processing. In other words, if mapping XML to objects is sufficiently simple, you not only don't need schemas, but have strong incentive to avoid them because of all the issues they introduce.

As you probably have guessed by looking at the title of this section, the combination of VTD-XML and XPath is ideally suited to schemaless data binding.

Why XPath and VTD-XML?

There are three main reasons why XPath lends itself to our new approach. First, when properly written, your data binding code only needs proximate knowledge (e.g., topology, tag names, etc.) of the XML tree structure, which you can determine by looking at the XML data. XML schemas are no longer mandatory. Furthermore, XPath allows your application to bind the relevant data items and filter out everything else, avoiding wasteful object creation. Finally, the XPath-based code is easy to understand, simple to write and debug, and generally quite maintainable.

 But XPath still needs the parsed tree of XML to work. Superior to both DOM and SAX, VTD-XML offers a long list of features and benefits relevant to data binding, some of which are highlighted in the following list.

Process Description

The process for our new schemaless XML data binding roughly consists of the following steps. 

  1. Observe the XML document and write down the XPath expressions corresponding to the data fields of interest.

  2. Define the class file and member variables to which those data fields are mapped.

  3. Refactor the XPath expressions in step 1 to reduce navigation cost.

  4. Write the XPath-based data binding routine that does the object mapping. XPath 1.0 allows XPath to be evaluated to four data types: string, Boolean, double and node set. The string type can be further converted to additional data types.

  5. If the XML processing requires the ability to both read and write, use VTD-XML's XMLModifier to update XML's content. You may need to record more information to take advantage of VTD-XML's incremental update capability.

A Sample Project

Let me show you how to put this new XML binding in action. This project, written in Java, follows the steps outlined above to create simple data binding routes. The first part of this project creates read-only objects that are not modified by application logic. The second part extracts more information that allows the XML document to be updated incrementally. The last part adds VTD+XML indexing to the mix. The XML document I use in this example looks like the following:

<CATALOG>
    <CD>
        <TITLE>Empire Burlesque</TITLE>
        <ARTIST>Bob Dylan</ARTIST>
        <COUNTRY>USA</COUNTRY>
        <COMPANY>Columbia</COMPANY>
        <PRICE>10.90</PRICE>
        <YEAR>1985</YEAR>
    </CD><br/> 
    <CD>
        <TITLE>Still Got the Blues</TITLE>
        <ARTIST>Gary More</ARTIST>
        <COUNTRY>UK</COUNTRY>
        <COMPANY>Virgin Records</COMPANY>
        <PRICE>10.20</PRICE>
        <YEAR>1990</YEAR>
      </CD>
    <CD>
        <TITLE>Hide Your Heart</TITLE>
        <ARTIST>Bonnie Tyler</ARTIST>
        <COUNTRY>UK</COUNTRY>
        <COMPANY>CBS Records</COMPANY>
        <PRICE>9.90</PRICE>
        <YEAR>1988</YEAR>
    </CD><br/> 
    <CD>
        <TITLE>Greatest Hits</TITLE>
        <ARTIST>Dolly Parton</ARTIST>
        <COUNTRY>USA</COUNTRY>
        <COMPANY>RCA</COMPANY>
        <PRICE>9.90</PRICE>
        <YEAR>1982</YEAR>
      </CD>
</CATALOG>

Read Only

The application logic is driven by CD record objects between 1982 and 1990 (non-inclusive), corresponding to XPath "/CATALOG/CD[ YEAR < 1990 and YEAR>1982]." The class definition (shown below) contains four fields, corresponding to the title, artist, price, and year of a CD.

public class CDRecord {
    String title;
    String artist;
    double price;
    int year;
}

The mapping between the object member and its corresponding XPath expression is as follows:

The XPath expressions can be further refactored (for efficiency reasons) as following:

The following code has two methods: The bind() accepts the XML file name as the input, performs the data binding by plugging in the above XPath expressions, and returns an array list containing object references. The main() method invokes the binding routine and prints out the content of the objects.

class XMLBinder {
    ArrayList bind(String fileName) throws Exception {
        ArrayList al = new ArrayList();
        VTDGen vg = new VTDGen();
        AutoPilot ap0 = new AutoPilot();
        AutoPilot ap1 = new AutoPilot();
        AutoPilot ap2 = new AutoPilot();
        AutoPilot ap3 = new AutoPilot();
        AutoPilot ap4 = new AutoPilot();
        ap0.selectXPath("/CATALOG/CD[YEAR > 1982 and YEAR <1990]");
        ap1.selectXPath("TITLE");
        ap2.selectXPath("ARTIST");
        ap3.selectXPath("PRICE");
        ap4.selectXPath("YEAR");
        if (vg.parseFile(fileName,false)) {
            VTDNav vn = vg.getNav();
            ap0.bind(vn);
            ap1.bind(vn);
            ap2.bind(vn);
            ap3.bind(vn);
            ap4.bind(vn);
            while(ap0.evalXPath()!=-1){
                CDRecord cdr = new CDRecord();
                cdr.title = ap1.evalXPathToString();
                cdr.artist = ap2.evalXPathToString();
                cdr.price = ap3.evalXPathToNumber();
                cdr.year = (int)ap4.evalXPathToNumber(); // evalXPathToNumber evaluates to a double
                al.add(cdr);
            }
            ap0.resetXPath();
        }
        return al;
    }


    public static void main(String[] args) throws Exception {
        XMLBinder xb = new XMLBinder();
        ArrayList al = xb.bind("d:/javaworld/update/example2/cd.xml");
        Iterator it = al.iterator();
        while(it.hasNext()) {
            CDRecord cdr = (CDRecord) it.next();
            System.out.println("===================");
            System.out.println("TITLE:  ==> "+cdr.title);
            System.out.println("ARTIST: ==> "+cdr.artist);
            System.out.println("PRICE:  ==> "+cdr.price);
            System.out.println("YEAR:   ==> "+cdr.year);
        }
    }
}

Read and Write

This part of the project deals with the same XML document, but the application logic lowers the PRICE of each CD by 1. To that end, two changes are necessary: the CD record class file now has a new field (named priceIndex) containing the VTD index for text node of PRICE.

public class CDRecord2 {
    String title;
    String artist;
    double price;
    int priceIndex;
    int year;
}

The new bind() now accepts a VTDNav object as the input, but returns the same array list. The main method invokes bind(...), iterates over the objects, and uses XMLModifer to create cd_updated.xml.

import com.ximpleware.*;
import java.util.*;
import java.io.*; 

class XMLBinder2 { 
    ArrayList bind(VTDNav vn) throws Exception {
        ArrayList al = new ArrayList();
        AutoPilot ap0 = new AutoPilot();
        AutoPilot ap1 = new AutoPilot();
        AutoPilot ap2 = new AutoPilot();
        AutoPilot ap3 = new AutoPilot();
        AutoPilot ap4 = new AutoPilot();
        ap0.selectXPath("/CATALOG/CD[YEAR > 1982 and YEAR < 1990]"); /*/CATALOG/CD[YEAR > 1982 and YEAR < 1990]/TITLE */
        ap1.selectXPath("TITLE"); /* /CATALOG/CD[YEAR > 1982 and YEAR < 1990]/ARTIST */
        ap2.selectXPath("ARTIST"); /* /CATALOG/CD[YEAR > 1982 and YEAR < 1990]/PRICE */
        ap3.selectXPath("PRICE"); /* /CATALOG/CD[YEAR > 1982 and YEAR < 1990]/YEAR */
        ap4.selectXPath("YEAR"); 
        ap0.bind(vn);
        ap1.bind(vn);
        ap2.bind(vn);
        ap3.bind(vn);
        ap4.bind(vn);
        while (ap0.evalXPath() != -1) {
            CDRecord2 cdr = new CDRecord2()
            cdr.title = ap1.evalXPathToString()
            cdr.artist = ap2.evalXPathToString()
            vn.push()
            ap3.evalXPath()
            cdr.priceIndex = vn.getText()
            cdr.price = vn.parseDouble(cdr.priceIndex)
            ap3.resetXPath()
            vn.pop()
            cdr.year = (int) ap4.evalXPathToNumber()
            al.add(cdr);
        }
        ap0.resetXPath();
        return al;
    } 

    public static void main(String[] args) throws Exception {
        XMLBinder2 xb = new XMLBinder2();
        VTDGen vg = new VTDGen();
        XMLModifier xm = new XMLModifier(); 

        if (vg.parseFile("cd.xml",false)) {
            VTDNav vn = vg.getNav();
            ArrayList al2 = xb.bind(vn);
            Iterator it2 = al2.iterator();
            xm.bind(vn);
            while(it2.hasNext()) {
                CDRecord2 cdr = (CDRecord2) it2.next();  // reduce prices by 1
                xm.updateToken(cdr.priceIndex, ""+(cdr.price - 1));
            }
            xm.output(new FileOutputStream("cd_update.xml"));
        }<br/> 
    }

The updated XML, containing new prices, looks like the following:

<CATALOG>
    <CD>
        <TITLE>Empire Burlesque</TITLE>
        <ARTIST>Bob Dylan</ARTIST>
        <COUNTRY>USA</COUNTRY>
        <COMPANY>Columbia</COMPANY>
        <PRICE>9.9</PRICE>
        <YEAR>1985</YEAR>
    </CD>
    <CD>
        <TITLE>Still Got the Blues</TITLE>
        <ARTIST>GaryMore</ARTIST>
        <COUNTRY>UK</COUNTRY>
        <COMPANY>Virgin Records</COMPANY>
        <PRICE>10.20</PRICE>
        <YEAR>1990</YEAR>
    </CD>
    <CD>
        <TITLE>Hide Your Heart</TITLE>
        <ARTIST>Bonnie Tyler</ARTIST>
        <COUNTRY>UK</COUNTRY>
        <COMPANY>CBS Records</COMPANY>
        <PRICE>8.9</PRICE>
        <YEAR>1988</YEAR>
    </CD>
    <CD>
        <TITLE>Greatest Hits</TITLE>
        <ARTIST>Dolly Parton</ARTIST>
        <COUNTRY>USA</COUNTRY>
        <COMPANY>RCA</COMPANY>
        <PRICE>9.90</PRICE>
        <YEAR>1982</YEAR>
      </CD>
</CATALOG>

Read, Write and Indexing

With the introduction of the native XML indexing feature since VTD-XML 2.0, you don't even have to parse XML. If cd.xml is pre-indexed, just load it up and let the binding routine go to work!  The main() can be quickly rewritten as following to entirely bypass parsing.

public static void main(String[] args) throws Exception {
    XMLBinder2 xb = new XMLBinder2();
    VTDGen vg = new VTDGen();
    XMLModifier xm = new XMLModifier(); 
    //cd.vxl is the index for cd.xml 
    VTDNav vn = vg.loadIndex("cd.vxl");
    ArrayList al2 = xb.bind2(vn);
    Iterator it2 = al2.iterator();
    xm.bind(vn);
    while(it2.hasNext()) {
        CDRecord2 cdr = (CDRecord2) it2.next();
        // reduce prices by 1
        xm.updateToken(cdr.priceIndex, ""+(cdr.price - 1));
        xm.output(new FileOutputStream("cd_update.xml"));
    }
}

Benefits

As you have seen, By using Xpath "/CATALOG/CD[ YEAR < 1990 and YEAR>1982]," the examples above only deal with the most relevant records. This is important because your applications may process XML files containing hundreds, if not thousands, of data items, most of which are not needed to drive the application logic. In addition, thanks to Xpath, you can only void extracting un-used fields. Furthermore, if some of new data fields are added, the example code will still work unmodified.

Adopting this new XML binding instantly turbocharges your XML applications. Whether it is parsing, indexing, incremental update, non-blocking XPath, or avoiding needless object creation, VTD-XML not only does many things well, it is the only tool that meets performance requirements for your next SOA project.

With VTD-XML, you also wave goodbye to all the schema related drawbacks. Your XML/SOA applications become far less likely to break and much more resilient to changes.

In other words, it is official: the performance issue of XML is no more. Welcome to the age of SOA!

Conclusion

I hope this article has helped you understand the process and benefits of the new XML data binding. XML schema is a means to an end, not an end in itself.  As you probably have seen, when XML processing is made simple enough, XML schemas are mostly a bad thing for data binding.  Why create all those objects that you never use? Finding a good solution once again requires that we change the problem first. Your SOA success starts with the first step, at the foundation. By combining XPath with VTD-XML, you can now break free of XML schema, and achieve unrivaled efficiency and agility.

Jimmy Zhang is a co-founder of XimpleWare, a provider of high performance XML processing solutions.


Return to ONJava.

Copyright © 2009 O'Reilly Media, Inc.