| Sign In/My Account | View Cart |
|
Related Reading
Java and XML |
If there's anything that terrifies me, it's the lack of vendor-independence in XML parsing code these days. Although you may be positive you'll never move away from your current parser, I've seen software that costs hundreds of thousands of dollars tossed out overnight. I'd bet on uncertainty, which means coding up vendor-independence whenever possible. To achieve this, you should always try to use JAXP, the Java API for XML Parsing. However, many times you may be limited to using SAX, the Simple API for XML, since it can handle very large documents, and only newer versions of parsers carry the useful 1.1 version of JAXP. If you are forced to work with just SAX, you will still need to find ways to keep vendor-specific code out of your programs. Here's an example of a bad way to get a SAX parser, which is completely dependent on Apache Xerces:
org.xml.sax.XMLReader reader =
new org.apache.xerces.parsers.SAXParser();
reader.parse(new InputSource("yourXMLURI"));
Not so good, right? Clearly changing a parser implementation, or even the class used, requires changing code, recompiling, retesting...and so on. Here's a much better way to handle this:
org.xml.sax.XMLReader reader =
org.xml.sax.helpers.XMLReaderFactory.createXMLReader();
All you have to do to make this work is set a simple system property, org.xml.sax.driver, with the name of the parser class to load. For example, using Xerces again, you would set the value of this property to org.apache.xerces.parsers.SAXParser. You could do this on the command line, with a properties file, or in a variety of other ways, all without needing to change working code to effect the change. In this way your code becomes vendor-indpendent, even when you can't use JAXP.
Use the interface, not the implementation.
Another common slip-up I see is when developers are working with interface-based models like the W3C's DOM, the Document Object Model. In this model, each interface has an implementation in the vendor's release. For example, the DOM interface org.w3c.dom.Document is implemented by the Xerces class org.apache.xerces.dom.DocumentImpl. However, all too often, I see code like this:
DocumentImpl doc = new org.apache.xerces.dom.DocumentImpl();
The problem with this is that you aren't working with DOM here; you're working with a specific version of DOM, Apache's version. By making this simple change:
Document doc = new org.apache.xerces.dom.DocumentImpl();
your code can begin to interoperate with other DOM implementations. While many developers see this second approach as "less type-specific," it's always a good idea to stay at the interface level when possible in object-oriented programming. Also consider that when working in XML, interoperability is key; allowing the possibility to work with other DOM implementations may save you hours of work one day in the future.
Customize JDOM with subclasses.
Another idea I discuss in a lot of detail in my book is working with custom JDOM subclasses. Many folks ask why JDOM isn't interface-based. I'll avoid that long answer here (it's in the book), but suffice it to say that you can subclass JDOM and get the same effect. Many folks don't know that they can build a JDOM Document with their own subclasses. To do so, you'll need to create your own custom JDOM classes that extend the default Element, Attribute, and so forth. Next, write an implementation of the org.jdom.input.JDOMFactory class. This provides a method for each constructor of each JDOM concrete class. For example, here are the declarations for the element method, which matches the Element class's four constructors:
public Element element(String name, Namespace namespace); public Element element(String name); public Element element(String name, String uri); public Element element (String name, String prefix, String uri);
So, let's say you've got your own subclass of Element, called my.package.MyElement. To ensure that a JDOM tree built using SAXBuilder uses your customized element type, instead of the default type, you would have this implementation of these methods:
public Element element(String name, Namespace namespace) {
return new my.pakage.MyElement(name, namespace);
}
public Element element(String name) {
return new my.pakage.MyElement(name);
}
public Element element(String name, String uri) {
return new my.pakage.MyElement(name, uri);
}
public Element element
(String name, String prefix, String uri) {
return new my.pakage.MyElement(name, prefix, uri);
}
This is simple enough, right? So implement all of these methods, returning your custom types instead of the default JDOM types. Once you've got this done, say in a class called my.package.MyJDOMFactory, you're ready to build your document:
SAXBuilder builder = new SAXBuilder();
builder.setFactory(new my.package.MyJDOMFactory());
Document customDoc = builder.build("someURI.xml");
And viola! The resulting Document will be built with your customized classes, returned from the MyJDOMFactory class. Pretty cool, huh?
Cache, reuse, and optimize using JAXP and TrAX.
With the new release of JAXP version 1.1, there are about a million and one ways to improve code efficiency. In particular, the javax.xml.transform.Templates interface allows for compiling an XSL stylesheet into a binary object and then using it, reusing it, and reusing it again, all without extra processing or memory overhead.
For more on the new Java XML APIs, read Moving to a Higher Plane, also by Brett McLaughlin.
I won't spend a lot of time on this, because as I write this article I see that my colleague Eric Burke has released a series of Java and XSLT tips and tricks that covers this very topic. I'll refer you to this article, and in particular to the first tip, and simply say that it's advice you would be smart to take. Check it out, and use it. You'll find your code and transformations run faster, and in a lot less memory.
Use the right tool.
Finally, I want to urge you to learn more about each of the available Java and XML APIs. Now I'm not saying that just for the sake of knowing and showing off; rather, knowing your tools means knowing how to use them, but also when to use them. Time and time again I see programs written that would have been much easier, sometimes hundreds or thousands of lines shorter, and infintely cleaner, had the developer used a better tool for the job. Here's a few "sub" tips on determining which API is right for the various situations:
Large documents: If you are dealing with huge documents, you need SAX. I know there is a lot of hype around in-memory models like DOM and JDOM supporting this, but it's just not reality yet. If your documents are larger than a meg or so, I'd use SAX. It's sequential, fast, and simple.
Web sites: Unless you are running a highly dynamic Web site (I'd say less than 1 percent of all Web sites are in this category), on-the-fly XSL transformations are a waste of good processing power. You'd be much better off generating your site statically, using Cocoon (statically); XMLC; or even just Apache Xalan. You'll decrease complexity, user load time, and most importantly, headaches.
SOAP or RPC?: For most inter- and intra-application communication, SOAP is overkill. Very rarely will you actually need the complex envelope handling, data mapping, and error processing in everyday Java-to-Java applications. Don't get me wrong--SOAP is great for communicating with non-Java components, UDDI registries, and through firewalls. It's just not the magic bullet that some are saying it is, and is a costly protocol in terms of overhead compared to simpler solutions like XML-RPC.
Beware JAXB: The Java Architecture for Data Binding, or JAXB, is quite the rage these days. However, most people are missing the very small number attached to the release--we're still in the pre-0.5 days, which is awfully early. Now let me be clear: one day JAXB is going to be a great API. However, I'm seeing production code going out using JAXB, and that's just a bad idea. Even if you throw out the significant performance problems with JAXB right now, the API itself is still in flux. You'd be much better off to stick with Castor or Zeus. They're a lot more stable, as well as easier to use.
Simple is best: Finally, a generalization. Use the simplest solution for a task, and watch as you have hours to spare while coworkers debug their more complex, more "glamorous" solutions to simple problems. If you don't need data binding or SOAP or WSDL, don't use them; stick with good old SAX, DOM, or JDOM. If you don't need an XSL transformation, just use a boring little servlet. And if RSS takes care of your problem, don't spend hours working out a Web services solution. Trust me here--the easier the solution, the easier the implementation.
Obviously, this is just a sampling of what's interesting in the Java and XML world these days. I hope it whets your appetite for more information, and if it does, I think that Java and XML, 2nd Edition can give you what you are looking for. No matter how you choose to gather information, though, never stop pursuing new and better ways to code--it makes life fun. So enjoy these tips and tricks, and I'll see you online.
Brett McLaughlin is one of the leading authorities on Java, XML, enterprise applications, and open source software. He is the Enhydra strategist at Lutris Technologies, where he is responsible for the direction and strategy for the Enhydra application server. He's the founder or cofounder of numerous other open source projects, such as JDOM (currently in JSR at Sun), ApacheTurbine (a servlet-based Web applications framework), and Enhydra Zeus (an XML data-binding framework). His role as a contributor on OpenEJB, jBoss, and Apache Cocoon places him in the middle of Java and XML innovation. In addition to his technology contributions, Brett is a prolific writer; he is the author of Java & XML (O'Reilly), the moderator of IBM's Java and XML tools and technologies newsgroup, and flashline.com's biweekly component columnist. He's written dozens of articles for IBM Developer Works, JavaWorld, and oreilly.com.
O'Reilly & Associates has recently released (August 2001) Java & XML, 2nd Edition.
Sample Chapter 12, SOAP, is available free online.
You can also look at the Table of Contents, the Index, and the Full Description of the book.
For more information, or to order the book, click here.