ONJava.com -- The Independent Source for Enterprise Java
oreilly.comSafari Books Online.Conferences.

advertisement

AddThis Social Bookmark Button

XSLT Processing with Java
Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9

Feeding JDOM Output into JAXP

The DOM API is tedious to use, so many Java programmers opt for JDOM instead. The typical usage pattern is to generate XML dynamically using JDOM and then somehow transform that into a web page using XSLT. This presents a problem because JAXP does not provide any direct implementation of the javax.xml.Source interface that integrates with JDOM.

As this is being written, members of the JDOM community are writing a JDOM implementation of javax.xml.Source that will directly integrate with JAXP.

There are at least three available options:
  • Use org.jdom.output.SAXOutputter to pipe SAX 2 events from JDOM to JAXP.
  • Use org.jdom.output.DOMOutputter to convert the JDOM tree to a DOM tree, and then use javax.xml.transform.dom.DOMSource to read the data into JAXP.
  • Use org.jdom.output.XMLOutputter to serialize the JDOM tree to XML text, and then use java.xml.transform.stream.StreamSource to parse the XML back into JAXP.

JDOM to SAX approach

The SAX approach is generally preferable to other approaches. Its primary advantage is that it does not require an intermediate transformation to convert the JDOM tree into a DOM tree or text. This offers the lowest memory utilization and potentially the fastest performance.

In support of SAX, JDOM offers the org.jdom.output.SAXOutputter class. The following code fragment demonstrates its usage:

TransformerFactory transFact = TransformerFactory.newInstance( );
if (transFact.getFeature(SAXTransformerFactory.FEATURE)) {
  SAXTransformerFactory stf = (SAXTransformerFactory) transFact;
  // the 'stylesheet' parameter is
  // an instance of JAXP's
  // javax.xml.transform.Templates interface
  TransformerHandler transHand = stf.newTransformerHandler(stylesheet);
 
  // result is a Result instance
  transHand.setResult(result);
  SAXOutputter saxOut = new SAXOutputter(transHand);
  // the 'jdomDoc' parameter is an instance
  // of JDOM's org.jdom.Document class. In contains
  // the XML data
  saxOut.output(jdomDoc);
} else {
  System.err.println("SAXTransformerFactory is not supported");
}

JDOM to DOM approach

The DOM approach is generally a little slower and will not work if JDOM uses a different DOM implementation than JAXP. JDOM, like JAXP, can utilize different DOM implementations behind the scenes. If JDOM refers to a different version of DOM than JAXP, you will encounter exceptions when you try to perform the transformation. Since JAXP uses Apache's Crimson parser by default, you can configure JDOM to use Crimson with the org.jdom.adapters.CrimsonDOMAdapter class. The following code shows how to convert a JDOM Document into a DOM Document:

org.jdom.Document jdomDoc = createJDOMDocument( );
// add data to the JDOM Document
...
 
// convert the JDOM Document into a DOM Document
org.jdom.output.DOMOutputter domOut = new org.jdom.output.DOMOutputter(
"org.jdom.adapters.CrimsonDOMAdapter");
org.w3c.dom.Document domDoc = domOut.output(jdomDoc);

The second line is highlighted because it is likely to give you the most problems. When JDOM converts its internal object tree into a DOM object tree, it must use some underlying DOM implementation. In many respects, JDOM is similar to JAXP because it delegates many tasks to underlying implementation classes. The DOMOutputter constructors are overloaded as follows:

// use the default adapter class
public DOMOutputter(  )
 
// use the specified adapter class
public DOMOutputter(String adapterClass)

The first constructor shown here will use JDOM's default DOM parser, which is not necessarily the same DOM parser that JAXP uses. The second method allows you to specify the name of an adapter class, which must implement the org.jdom.adapters.DOMAdapter interface. JDOM includes standard adapters for all of the widely used DOM implementations, or you could write your own adapter class.

JDOM to text approach

In the final approach listed earlier, you can utilize java.io.StringWriter and java.io.StringReader. First create the JDOM data as usual, then use org.jdom.output.XMLOutputter to convert the data into a String of XML:

StringWriter sw = new StringWriter(  );
org.jdom.output.XMLOutputter xmlOut
        = new org.jdom.output.XMLOutputter("", false);
xmlOut.output(jdomDoc, sw);

The parameters for XMLOutputter allow you to specify the amount of indentation for the output along with a boolean flag indicating whether or not linefeeds should be included in the output. In the code example, no spaces or linefeeds are specified in order to minimize the size of the XML that is produced. Now that the StringWriter contains your XML, you can use a StringReader along with javax.xml.transform.stream.StreamSource to read the data into JAXP:

StringReader sr = new StringReader(sw.toString( ));
Source xmlSource = new javax.xml.transform.stream.StreamSource(sr);

The transformation can then proceed just as it did in Example 5-4. The main drawback to this approach is that the XML, once converted to text form, must then be parsed back in by JAXP before the transformation can be applied.

Stylesheet Compilation

XSLT is a computer-programming language, expressed using XML syntax. This is not for the benefit of the computer, but rather for human interpretation. Before the stylesheet can be processed, it must be converted into some internal machine-readable format. This process should sound familiar, because it is the same process used for every high-level programming language. You, the programmer, work in terms of the high-level language, and an interpreter or compiler converts this language into some machine format that can be executed by the computer.

Interpreters analyze source code and translate it into machine code with each execution. In this case of XSLT, this requires that the stylesheet be read into memory using an XML parser, translated into machine format, and then applied to your XML data. Performance is the obvious problem, particularly when you consider that stylesheets rarely change. Typically, the stylesheets are defined early on in the development process and remain static, while XML data is generated dynamically with each client request.

A better approach is to parse the XSLT stylesheet into memory once, compile it to machine-format, and then preserve that machine representation in memory for repeated use. This is called stylesheet compilation and is no different in concept than the compilation of any programming language.

Templates API

Different XSLT processors implement stylesheet compilation differently, so JAXP includes the javax.xml.transform.Templates interface to provide consistency. This is a relatively simple interface with the following API:

public interface Templates {
    java.util.Properties getOutputProperties(  );
    javax.xml.transform.Transformer newTransformer(  )
            throws TransformerConfigurationException;
}

The getOutputProperties( ) method returns a clone of the properties associated with the <xsl:output> element, such as method="xml", indent="yes", and encoding="UTF-8". You might recall that java.util.Properties (a subclass of java.util.Hashtable) provides key/value mappings from property names to property values. Since a clone, or deep copy, is returned, you can safely modify the Properties instance and apply it to a future transformation without affecting the compiled stylesheet that the instance of Templates represents.

The newTransformer( ) method is more commonly used and allows you to obtain a new instance of a class that implements the Transformer interface. It is this Transformer object that actually allows you to perform XSLT transformations.

Since the implementation of the Templates interface is hidden by JAXP, it must be created by the following method on javax.xml.transform.TransformerFactory:

public Templates newTemplates(Source source)
        throws TransformerConfigurationException

As in earlier examples, the Source may obtain the XSLT stylesheet from one of many locations, including a filename, a system identifier, or even a DOM tree. Regardless of the original location, the XSLT processor is supposed to compile the stylesheet into an optimized internal representation.

Whether the stylesheet is actually compiled is up to the implementation, but a safe bet is that performance will continually improve over the next several years as these tools stabilize and vendors have time to apply optimizations.

Figure 5-6 illustrates the relationship between Templates and Transformer instances.

Diagram.
Figure 5-6. Relationship between Templates and Transformer

Thread safety is an important issue in any Java application, particularly in a web context where many users share the same stylesheet. As Figure 5-6 illustrates, an instance of Templates is thread-safe and represents a single stylesheet. During the transformation process, however, the XSLT processor must maintain state information and output properties specific to the current client. For this reason, a separate Transformer instance must be used for each concurrent transformation.

Transformer is an abstract class in JAXP, and implementations should be lightweight. This is an important goal because you will typically create many copies of Transformer, while the number of Templates is relatively small. Transformer instances are not thread-safe, primarily because they hold state information about the current transformation. Once the transformation is complete, however, these objects can be reused.

A Stylesheet Cache

XSLT transformations commonly occur on a shared web server with a large number of concurrent users, so it makes sense to use Templates whenever possible to optimize performance. Since each instance of Templates is thread-safe, it is desirable to maintain a single copy shared by many clients. This reduces the number of times your stylesheets have to be parsed into memory and compiled, as well as the overall memory footprint of your application.

The code shown in Example 5-10 illustrates a custom XSLT stylesheet cache that automates the mundane tasks associated with creating Templates instances and storing them in memory. This cache has the added benefit of checking the lastModified flag on the underlying file, so it will reload itself whenever the XSLT stylesheet is modified. This is highly useful in a web-application development environment because you can make changes to the stylesheet and simply click on Reload on your web browser to see the results of the latest edits.


Example 5-10: StylesheetCache.java

package com.oreilly.javaxslt.util;
 
import java.io.*;
import java.util.*;
import javax.xml.transform.*;
import javax.xml.transform.stream.*;
 
/**
* A utility class that caches XSLT
* stylesheets in memory.
*
*/
public class StylesheetCache {
  // map xslt file names to MapEntry instances
  // (MapEntry is defined below)
  private static Map cache = new HashMap( );
 
  /**
  * Flush all cached stylesheets from
  * memory, emptying the cache.
  */
  public static synchronized void flushAll( ) {
   cache.clear( );
  }
 
  /**
  * Flush a specific cached stylesheet from memory.
  *
  * @param xsltFileName the file name of
  * the stylesheet to remove.
  */
  public static synchronized void flush(String xsltFileName) {
   cache.remove(xsltFileName);
  }
 
  /**
  * Obtain a new Transformer instance for the
  * specified XSLT file name.
  * A new entry will be added to the
  * cache if this is the first request
  * for the specified file name.
  *
  * @param xsltFileName the file name
  * of an XSLT stylesheet.
  * @return a transformation context
  * for the given stylesheet.
  */
  public static synchronized Transformer newTransformer(String xsltFileName)
    throws TransformerConfigurationException {
   File xsltFile = new File(xsltFileName);
 
   // determine when the file was last modified on disk
   long xslLastModified = xsltFile.lastModified( );
   MapEntry entry = (MapEntry) cache.get(xsltFileName);
 
   if (entry != null) {
    // if the file has been modified more recently than the
    // cached stylesheet, remove the entry reference
    if (xslLastModified > entry.lastModified) {
      entry = null;
    }
   }
 
   // create a new entry in the cache if necessary
   if (entry == null) {
    Source xslSource = new StreamSource(xsltFile);
 
    TransformerFactory transFact = TransformerFactory.newInstance( );
    Templates templates = transFact.newTemplates(xslSource);
 
    entry = new MapEntry(xslLastModified, templates);
    cache.put(xsltFileName, entry);
   }
 
   return entry.templates.newTransformer( );
  }
 
  // prevent instantiation of this class
  private StylesheetCache( ) {
  }
 
  /**
  * This class represents a value in the cache Map.
  */
  static class MapEntry {
   long lastModified; // when the file was modified
   Templates templates;
 
   MapEntry(long lastModified, Templates templates) {
    this.lastModified = lastModified;
    this.templates = templates;
   }
  }
}


Because this class is a singleton, it has a private constructor and uses only static methods. Furthermore, each method is declared as synchronized in an effort to avoid potential threading problems.

The heart of this class is the cache itself, which is implemented using java.util.Map:

private static Map cache = new HashMap(  );

Although HashMap is not thread-safe, the fact that all of our methods are synchronized basically eliminates any concurrency issues. Each entry in the map contains a key/value pair, mapping from an XSLT stylesheet filename to an instance of the MapEntry class. MapEntry is a nested class that keeps track of the compiled stylesheet along with when its file was last modified:

static class MapEntry {
    long lastModified;  // when the file was modified
    Templates templates;
 
    MapEntry(long lastModified, Templates templates) {
        this.lastModified = lastModified;
        this.templates = templates;
    }
}

Removing entries from the cache is accomplished by one of two methods:

public static synchronized void flushAll(  ) {
    cache.clear(  );
}
 
public static synchronized void flush(String xsltFileName) {
    cache.remove(xsltFileName);
}

The first method merely removes everything from the Map, while the second removes a single stylesheet. Whether you use these methods is up to you. The flushAll method, for instance, should probably be called from a servlet's destroy( ) method to ensure proper cleanup. If you have many servlets in a web application, each servlet may wish to flush specific stylesheets it uses via the flush(...) method. If the xsltFileName parameter is not found, the Map implementation silently ignores this request.

The majority of interaction with this class occurs via the newTransformer method, which has the following signature:

public static synchronized Transformer newTransformer(String xsltFileName) throws TransformerConfigurationException {

The parameter, an XSLT stylesheet filename, was chosen to facilitate the "last accessed" feature. We use the java.io.File class to determine when the file was last modified, which allows the cache to automatically reload itself as edits are made to the stylesheets. Had we used a system identifier or InputStream instead of a filename, the auto-reload feature could not have been implemented. Next, the File object is created and its lastModified flag is checked:

File xsltFile = new File(xsltFileName);
 
// determine when the file was last modified on disk
long xslLastModified = xsltFile.lastModified(  );

The compiled stylesheet, represented by an instance of MapEntry, is then retrieved from the Map. If the entry is found, its timestamp is compared against the current file's timestamp, thus allowing auto-reload:

MapEntry entry = (MapEntry) cache.get(xsltFileName);
 
if (entry != null) {
  // if the file has been modified more 
  // recently than the cached stylesheet, 
  // remove the entry reference
  if (xslLastModified > entry.lastModified) {
      entry = null;
  }
}

Next, we create a new entry in the cache if the entry object reference is still null. This is accomplished by wrapping a StreamSource around the File object, instantiating a TransformerFactory instance, and using that factory to create our Templates object. The Templates is then stored in the cache so it can be reused by the next client of the cache:

// create a new entry in the cache if necessary
if (entry == null) {
  Source xslSource = new StreamSource(xsltFile);
 
  TransformerFactory transFact = TransformerFactory.newInstance(  );
  Templates templates = transFact.newTemplates(xslSource);
 
  entry = new MapEntry(xslLastModified, templates);
  cache.put(xsltFileName, entry);
}

Finally, a brand new Transformer is created and returned to the caller:

return entry.templates.newTransformer(  );

Related Reading

Java and XSLTJava and XSLT
By Eric M. Burke
Table of Contents
Index
Sample Chapter
Full Description
Read Online -- Safari

Returning a new Transformer is critical because, although the Templates object is thread-safe, the Transformer implementation is not. Each caller gets its own copy of Transformer so multiple clients do not collide with one another.

One potential improvement on this design could be to add a lastAccessed timestamp to each MapEntry object. Another thread could then execute every couple of hours to flush map entries from memory if they have not been accessed for a period of time. In most web applications, this will not be an issue, but if you have a large number of pages and some are seldom accessed, this could be a way to reduce the memory usage of the cache.

Another potential modification is to allow javax.xml.transform.Source objects to be passed as a parameter to the newTransformer method instead of as a filename. However, this would make the auto-reload feature impossible to implement for all Source types.


Return to ONJava.com.