ONJava.com    
 Published on ONJava.com (http://www.onjava.com/)
 See this if you're having trouble printing code examples


XML Publishing with Cocoon 2, Part 1

by David Cummings and Collin VanDyck
07/02/2003

Overview

One of the most important decisions to make when deciding to build an application is "Upon what foundation should I build?" This is a two-fold question, depending not only upon the language in which the application will be developed, but also the framework, if any, that will provide fundamental services and infrastructure to your system.

Cocoon is essentially an XML-publishing framework. That means that it facilitates the generation, transformation, and serialization (delivery) of XML content. Super! Also, Cocoon is written in Java, giving it many benefits from the cross-platform and code manageability perspectives. The information presented here was gained from our development of ContentXML Content Management Suite, an enterprise content-management system built on top of Cocoon 2.

Let's take a percursory look at Cocoon before we get into the thick of it. Because Cocoon is built around a web context, the flow of control is centered around the familiar request-response web paradigm. To illustrate how Cocoon is involved in this model, a simple example will suffice.

The user initiates a request from a web browser for a particular HTML document, say, hannonhill.com:8080/app/get/pressrelease/384792. Note that this request is most likely generated from a link on your site. The port 8080 is where Cocoon listens by default on our Tomcat/JBoss installation. Cocoon is installed as a web application archive (WAR), mounted at /app on hannonhill.com. Thus, Cocoon receives get/pressreleases/384792 as the URI of the request. This URI is passed off to the Cocoon sitemap, which performs some magic and then returns the result of the processing to the user as HTML.

Related Reading

Java & XML Data Binding
By Brett McLaughlin

If you didn't get all of that, no worries. It will all become clear in due time.

The Cocoon Sitemap

The Cocoon Sitemap is the entry point for all Cocoon-related requests. It is an XML text file (that is eventually compiled into a Java class automagically) and is typically named sitemap.xmap. It is logically divided into two sections: components and pipelines.

<map:components>
...
</map:components>
<map:pipelines>
...
</map:pipelines>
</map:components>

Speaking of frameworks, Cocoon itself is built upon the Avalon Framework, which, to quote: defines relationships between commonly used application components, best-of-practice pattern enforcements, and several lightweight convenience implementations of the generic components.

Avalon makes it very easy to plug in custom components to other applications built upon the Avalon framework. This is exactly what we can do in the first part of the Cocoon sitemap, the components section. Being able to write our own components and plug them in is extremely convenient, as we are able to customize the framework to do our own bidding. Before I start wringing my hands, let's table the components section and move on to the pipelines.

You can think of a Cocoon pipeline as a series of ordered buckets into which certain requests may fall into and be processed. A pipeline consists of one or more matchers that basically return true or false, with the true state implying processing by the components that are encapsulated by the matcher. To illustrate, a request is made. This request has a URI. Cocoon loads the sitemap, looks at the pipeline, and starts comparing the URI against each matcher (bucket) from top to bottom. When it reaches a matcher that evaluates to true, flow of execution transfers to the code inside of that matcher XML element.

You have probably guessed by now that the matchers themselves are components that you can plug in and swap out, if you want to match against something other than the request URI. For instance, we have developed matchers that evaluate a user's membership within a particular group, or examine request parameters to test for certain conditions.

The question arises: "How do you implement your own matcher?" Simple. Create a class that implements org.apache.cocoon.matching.Matcher, which mandates that the following method be implemented:

public Map match(String pattern, Map objectModel, Parameters parameters);

Cocoon supplies, at run time, environment-related data (through the objectModel object) and special parameters that you can pass into your custom matcher through the sitemap (through the parameters object, which is basically a hashmap of key/value pairs). Once this class has been deployed in your WAR, invoke it in the sitemap like any other matcher. Here is an example sitemap that employs a custom matcher:

<?xml  version="1.0" encoding="ISO-8859-1"?>
<map:sitemap xmlns:map="http://apache.org/cocoon/sitemap/1.0">
  <map:components>
        ...
    <map:matchers default="wildcard">
      <map:matcher name="wildcard" 
        src="org.apache.cocoon.matching.WildcardURIMatcher"/>
      <map:matcher name="premium-customer"
        src="com.hannonhill.cocoon.matching.PremiumCustomerMatcher"/>
      <map:matcher name="request" 
        src="org.apache.cocoon.matching.RequestParameterMatcher"/>
    </map:matchers>
        ...
  </map:components>
  <map:pipelines>
        <map:pipeline>
               <map:match pattern="get/pressrelease/*">
                       <map:match pattern="goldCustomer" type="premium-customer">
                               <map:generate src="docs/pressreleases/{../1}.xml"/>
                               <map:transform src="xslt/premiumPressrelease2html.xsl"/>
                               <map:serialize type="html"/>
                       </map:match>
                       <!-- executed if not a gold customer -->
                       <map:generate src="docs/pressreleases/{1}.xml"/>
                       <map:transform src="xslt/pressrelease2html.xsl"/>
                       <map:serialize type="html"/>
               </map:match>
        </map:pipeline>
  </map:pipelines>
</map:sitemap>

As you can see, we have listed a number of different matchers in the components section of our sitemap. Also, note that this is quite an incomplete sitemap. I have only shown those items that are relevant to the discussion at hand. For each matcher that you list, you must supply a name and a source class file. The name is how you will reference that matcher in the pipeline section of the sitemap, and the src attribute is the fully qualified name of the class that implements that component; in this case, that matcher. So you may say from our example above that the custom matcher implemented by the class com.hannonhill.cocoon.matching.PremiumCustomerMatcher is referenced elsewhere in the sitemap by the name premium-customer.

Before we get into our own custom matcher, note that it is wrapped in a higher-level matcher that does not specifically list a type. This is assumed then to be the default matcher that, from the components section of our sitemap, we assume to be the "wildcard" matcher. This is a matcher supplied in the Cocoon distribution that matches a request URI against a certain string, including wildcards. There are two wildcards, * and **. The former matches any string, excluding the / character, while the latter matches any string whatsoever, including the / character.

A matcher indicates a successful match by returning a Map (possibly empty, but definitely non-null). This Map may contain parameters that are available later on in the sitemap through variable interpolation. In our example, the request URI get/pressrelease/384792 would match our first matcher and the wildcard would match the string 384792, returned to the sitemap through the Map. This string could be referenced later in the sitemap through the {1} string. Thus, <map:generate src="docs/pressreleases/{1}.xml"/> would evaluate to <map:generate src="docs/pressreleases/384792.xml"/>.

You might wonder what {../1} means. Since the Maps returned by the matchers are nested, you can traverse them in much the same way as you would traverse a directory. Thus, the {../1} variable interpolation is not looking in the Map returned by the premium-customer matcher, but instead one level up, in the wildcard matcher! Your custom matchers will usually return a sitemap parameter with a meaningful name (for readability's sake). This particular Cocoon matcher uses numbers to indicate which wildcard match to dereference. Thus, a match pattern of get/*/* would match our URI with {1} containing pressreleases and {2} containing 384792.

Our custom matcher would implement the following method:

public Map match(String pattern, Map objectModel, Parameters parameters) {
        if ("goldCustomer".equals(pattern)) {
               if (customerIsGold()) {
                       return true;
               } else {
                       return false;
               }
        } else if ("silverCustomer".equals(pattern)) {        }
        ...
}

From this, you can see that Cocoon is responsible for instantiating our custom matcher and executing the match() method to determine if the code inside of that <map:match> statement is executed. Otherwise, it skips the code inside and go to the code directly below it.

How exactly does Cocoon generate information? Glad you asked.

Generators, XSP, and Data Flow in Cocoon

Now that you understand the sitemap and how matchers help us direct sitemap flow within Cocoon, it is appropriate to dive into a discussion of how data moves out of Cocoon and into a web browser. Each request made of Cocoon results in three distinct and closely related events happening:

  1. XML Generation
  2. XML Transformation (Optional)
  3. XML Serialization

Just as the request/response model is guaranteed in a web environment, this Cocoon model is a truth, an anchor to which you may tie yourself. Keep this in mind when learning Cocoon. Once the sitemap has resolved the request sufficiently, these three steps will be executed.

As mentioned previously, Cocoon is an XML-publishing framework. It uses SAX (Simple API for XML) transforms to enable this three-step process. Originally, Cocoon was structured around the DOM (Document Object Model) format, which was slower and used more memory. SAX is significantly faster and enables us to very easily make subtle changes to the XML during the XML publishing process. We'll get more into how SAX affects this process when discussing transformers in depth.

Our previous example used the example of a Cocoon app that served press releases:

...
<map:pipeline>
        <map:match pattern="get/pressrelease/*">
               <map:match pattern="goldCustomer" type="premium-customer">
                       <map:generate src="docs/pressreleases/{../1}.xml"/>
                       <map:transform src="xslt/premiumPressrelease2html.xsl"/>
                       <map:serialize type="html"/>
               </map:match>
               <!-- executed if not a gold customer -->
               <map:generate src="docs/pressreleases/{1}.xml"/>
               <map:transform src="xslt/pressrelease2html.xsl"/>
               <map:serialize type="html"/>
        </map:match>
</map:pipeline>
...

With our example URI, http://hannonhill.com:8080/app/get/pressrelease/384792, and assuming that the user requesting this resource is not a gold customer, the following is executed:

<map:generate src="docs/pressreleases/{1}.xml"/>
<map:transform src="xslt/pressrelease2html.xsl"/>
<map:serialize type="html"/>

The first line actually substitutes the matched string from the wildcard matcher, producing:

<map:generate src="docs/pressreleases/384792.xml"/>
<map:transform src="xslt/pressrelease2html.xsl"/>
<map:serialize type="html"/>

Note that in our pipeline, we did not specify which generator to use. How then does Cocoon generate docs/pressreleases/384792.xml? It uses the default generator defined in our sitemap components section — the file generator. This generator reads a file off of the disk, generating SAX events that are then processed by one or more transformers, if any, and are then serialized. This means that whoever was creating this press release would have to create a file named docs/pressreleases/384792.xml in the webapp/ directory.

<map:components>
...
<map:generators default="file">
   <map:generator name="file" label="content,data"
         src="org.apache.cocoon.generation.FileGenerator" pool-max="32" 
         pool-min="16" pool-grow="4"/>
   <map:generator name="serverpages" label="content,data"
         src="org.apache.cocoon.generation.ServerPagesGenerator"/>
   <map:generator name="status"
         src="org.apache.cocoon.generation.StatusGenerator"/>
 </map:generators>
...
</map:components>

XML is great for separating content from presentation. The XML file created by the press release author might look like this:

<press-release name="ContentXML Launched">
   <title>ContentXML Launched at Internet World</title>
   <author>David Cummings</author>
   <body>...</body>
<press-release>

In our example, we run this XML through an XSL file, which converts it into HTML for viewing in a web browser. The serializer then transforms the SAX events into a byte stream and sends it back as the response to the initial request.

Custom Generators (XSP)

While exciting, this is not yet really ContentXML exciting, but we're almost there :) The file generator is actually a Java class (more specifically, an Avalon component) that reads in the contents of a file and generates the SAX events from the file content. Cocoon gives the developer the ability to generate these generators dynamically through the use of a markup language called XSP, or eXtensible Server Pages. XSP is a subset of XML and provides custom tags that the XSP generator (specifically, the serverpages generator) recognizes and uses to generate a custom Java generator.

A custom XSP would do really nicely! Here we go:

<?xml  version="1.0"?>
<xsp:page xmlns:xsp="http://apache.org/xsp">
   <html>
   <body>
      <xsp:logic>
         long time = System.currentTimeMillis();
      </xsp:logic>
      Hi There. The current time in milliseconds is <b><xsp:expr>time</xsp:expr></b>.
   </body>
   </html>
</xsp:page>

As you can see, this is just an XML file with some very specific XSP tags that will get parsed and substituted. First off, every XSP file must have xsp:page as its root element. That element will not get returned into the resulting XML SAX stream. What will be returned is the html element, followed by the body element, followed by more stuff.

You can embed your own Java code in the XML by encapsulating it in <xsp:logic> tags. The XSP generator will generate a Java class with one very long generate() method, which Cocoon will eventually invoke. Thus, the variables you define inside of your <xsp:logic> tags will have visibility throughout the XSP file.

There is one exception, though. <xsp:logic> tags defined outside of the root content element (in this case, html) define class-level Java code. In other words, if you place an <xsp:logic> tag right after the <xsp:page> element, the contained code will be placed outside of the generate method. Typically, you would define any functions there. I recommend against using variables in class-level Java code. The very nature of XSP is to define dynamic XML generators. Having variables stick around could have unwanted side effects.

Also worthy of notice is the <xsp:expr> tag. This tag allows Java expression evaluation outside of the context of the <xsp:logic> tags. In our example above, <xsp:expr>time</xsp:expr> evaluates to the millisecond value of the current time.

Another typical use of XSP is to loop over a result set, producing dynamic XML:

<members>
<xsp:logic>
String[] members = getClubMemberNames(); // defined elsewhere.
for (int i = 0; i &lt; members.length; i++) {
   </xsp:logic>
      <member>
         <xsp:attribute name="name"><xsp:expr>members[i]</xsp:expr></xsp:attribute>
      </member>
   <xsp:logic>
}
</xsp:logic>
</members>

which produces something like this:

<members>
  <member name="hannonhill.com"/>
  <member name="contentxml.com"/>
  <member name="superupdate.com"/>
  <member name="zapedit.com"/>
</members>

Remember that the XSP page is XML, so inserting superfluous < and such characters is a big no-no. In fact, the XSP generator will throw an exception telling you so when it tries to compile the XSP XML into a Java Generator. If you need these characters in your Java code, you must use the XML-escaped equivalents. For instance, in our example:

for (int i = 0; i &lt; members.length; i++)

This is definitely one of the tradeoffs of using XSPs, albeit a relatively small one.

Using XSP for Form Generation

From an application development perspective, form creation has always been approached with both brute force and elegance. We like to think that XSP provides a healthy lean towards the latter! For the purpose of this discussion, assume that all of our form generation goes through three steps:

  1. XSP generation of form elements.
  2. Post-processing through an XSL stylesheet.
  3. Serialization to HTML.

Why is step #2 necessary? Why not generate all of your HTML straight from the XSP? Isn't less better? In this case, the advantages you get by abstracting display logic out of the generator and into the transform is leagues better. Most importantly, all of your display logic is in one place. One change will have application-wide propagation effects, and this has a huge impact on application maintainability.

Assume that a user of your content management solution is trying to log in, hitting the URI /login. In the sitemap, this matches the following:

<map:match pattern="login">
   <map:generate src="docs/login.xsp" type="serverpages"/>
   <map:transform src="common/stylesheets/display2html"/>
   <map:serialize type="html"/>
</map:match>

In fact, this is how Cocoon applications generate most of their form content! docs/login.xsp could look something like this:

<?xml version="1.0"?>
<xsp:page xmlns:xsp="http://apache.org/xsp">
   <form>
      <title>Log in to ContentXML</title>
      <action>do/submit/login</action>
      <form-item type="text">
         <name>loginName</name>
         <output>Login Name:</output>
      </form-item>
      <form-item type="password">
         <name>password</name>
         <output>Password:</output>
      </form-item>
      <form-item type="submit"/>
   </form>
</xsp:page>

As you can see, there is no HTML anywhere in the XML that is generated from this XSP. When the XML is generated, it is then transformed by common/stylesheets/display2html.xsl, which "understands" the <form> XML element, processing its sub-elements accordingly and applying all applicable formatting to produce a pretty form with a login text field, a password field, a submit button, and anything else you may desire.

Imagine having 150 forms scattered around your application. Changing how display2html.xsl processes form XML will instantly change all of the forms in your application. In fact, we use this model to control the overall look and feel of our application. It takes us almost no time to completely change how the app looks, making it easy to rebrand and update! Aces!

David Cummings is the CEO of Hannon Hill Corporation which focuses on content management software solutions.

Collin VanDyck is the lead developer of ContentXML and an integral part of the Hannon Hill team.


Return to ONJava.com.

Copyright © 2009 O'Reilly Media, Inc.