Published on ONJava.com (http://www.onjava.com/)
 See this if you're having trouble printing code examples

Code-Generation Techniques for Java

by Jack Herrington

Working in Java either means writing a little bit of complex code or writing a lot of gruntwork code. J2EE is a prime example; implementing the persistence for a single database table takes five classes and two interfaces using EJBs, and almost all of the classes are clerical work. We have to write them, but we don't have to do it by hand. Code-generation techniques can make building high-quality EJB code a breeze.

Will code generation revolutionize computing and change the way we develop forever? Yes, but it will take a while. Software engineering has always concentrated on increasing our level of abstraction. In the beginning, we hand-wrote machine code; then we created assemblers and macro assemblers. After that, we created Fortran and compiled our code into assembler. Then came structure programming, and after that, object-oriented programming. With each step, we have increased our level of abstraction and, thus, our ability to create higher quality applications with more functionality, more quickly.

What is Code Generation?

What is this panacea for developers called code generation? Code generation is the technique of writing and using programs that build application and system code. To understand code generation, you need to understand what goes in and what comes out. What goes in is the design for the code in a declarative form: "I need two tables named book and author with these fields." What comes out is one or more target files. It could be Java code, deployment descriptors, SQL, documentation, or any type of controlled output.

Figure 1 shows the basic form of today's code generators:

Figure 1. The process of code generation

The components can change slightly between the different models, but the song remains the same. The code generator reads in the design, then uses a set of templates to build output code that implements the design. The separation between code generation logic in the generator and output formatting in the templates is akin to the separation between business logic and user interfaces in web applications.

Code generators are not wizards. Wizards are passive generators. They write code once, and then it's up to you to maintain the code forever. Code generators are active. They continually maintain code over multiple generation cycles. As the designs change, the input to the generator changes, and new code is created to match the design. This is a key advantage — when have you been on a project where the requirements don't change?

Related Reading

Head First EJB
Passing the Sun Certified Business Component Developer Exam
By Kathy Sierra, Bert Bates

What Are the Benefits?

Before we get into specific examples of code generators for Java, let's make sure we have the end goals firmly in mind. One way to approach this is to think about the qualities we want in an optimal generator.

Now that we understand that benefits that we want, and how those are addressed by code generation techniques in general, we should understand what we expect to use code generation for in the Java context.

What We Expect the Generator to Handle

The output files of a generator are called the target files. There are several generation targets within the Java enterprise application stack. Figure 2 shows the stack:

Figure 2. J2EE generation targets

All four of these elements of the stack are potential generation targets, but some are more common than others. From the bottom to the top:

It's obvious that code generation is powerful and can build useful code, but does it have drawbacks?

What to Look Out For

Code generation is not without pitfalls and detractors. One of the most common complaints is that code that was once active is now being hand-modified and thus cannot be re-generated. One trick is never to check the generated source into the code base. This ensures that engineers will always be required to use the generator as part of the compilation process. This keeps the generator alive and keeps engineers from modifying the output code.

Another problem is that engineers who have been around for around since the early 90s liken code generators to Computer-Aided Software Engineering (CASE) tools. The comparison is mistaken because code generators are developed bottom-up by engineers for engineers. CASE tools were developed as a top-down replacement for programming languages and for engineers.

There are more reasons that engineers are skeptical about generation. Some issues are technical and others are cultural. Some times it comes down to simple job preservation. These tend to be situation-specific and boil down to simple issues: trust, teamwork, and education. In order to successfully deploy a generator, the team must trust the tool. They must feel that they have some control over the tool and its implementation. They also need to know how the tool is used both at a basic level (e.g., How do I run it?), and at a specific level (e.g., How do I specify when I need a table with a compound primary key?).

Perhaps the biggest drawback of code generation is that it falls to the implementer of the tool to ensure successful adoption within the team. If you put a copy of the code generator on the server and expect that people will immediately understand its use and the compelling value, then you are sure to fail. Education and empathy are key.

Given an understanding of which Java application components we can generate and what we have to look out for, let's talk about the generators that build them.

Code-Driven Approach: XDoclet

The most popular code generator for Java is XDoclet, and for good reason. It's easy and pragmatic, and it fits a need. XDoclet builds database-persistence beans to match the requirements specified in special JavaDoc comments within the Java entity bean code. We call this the "code driven approach" because it uses source code as the design input source.

Given a single entity bean with some markup, XDoclet will create the session beans, interfaces, and data access object required to complete the functional set. It's a pretty sweet deal for someone looking to get some work done quickly without having to go to the effort required by other code generation solutions. Figure 3 illustrates how XDoclet relates to the application stack:

Figure 3. XDoclet and the application stack

XDoclet has grown beyond just bean generation. It now acts as a generic code-generation platform for solutions that use JavaDoc markup as a source for design information. There are XDoclet modules for all types of outputs and you can easily create your own.

XDoclet's only drawback is its level of abstraction. Because the design is described in JavaDoc tags embedded in the code, code and design are bound tightly together with implementation specifics. Given this binding. it would be difficult to use XDoclet markup to generate complete code in a different language (e.g., C#).

VDoclet is an XDoclet clone that uses Velocity as the template language.

Model-Driven Approach: Custom

The alternative to the code-driven approach is to build code from an abstract model of the design. This model-driven approach comes in two flavors: MDA and custom. We will start with the custom approach and then get into MDA.

Using tools like XSLT, Velocity, and Jostraca, we can build textual output from an input specification. We can use these tools to build code by specifying a model of the code as input, using the template to specify the code.

The advantage over the code-driven approach is that while today these templates build EJBs, they could easily build JDO classes tomorrow, or C# the day after that. Keeping the model abstract makes portability a reality.

One downside to this approach is that each template is completely self-contained. There is no central code generator that is responsible for the interpretation of the design. This means that one template could interpret a date as just a date stamp, while another could interpret it as a date and time stamp. This is akin to the problems experienced with two-tier application servers where the business logic is not properly factored away from the display.

Another downside is that you are building a custom solution that will require team education. However, given the developing and fluid state of code-generation solutions today, even if you go with an existing solution, you will not often find engineers with extensive generation experience.

Model-Driven Approach: MDA

Model-Driven Architecture (MDA) is the Object Management Group (OMG) three-letter-acronym (TLA) initiative for code generation. I'm only slightly kidding; there are several TLA standards within MDA. The central idea is simple: turn a model in UML (Unified Modeling Language) into code (no, that's not a four-letter acronym).

Figure 4 shows the flow of an MDA generator:

Figure 4. The flow of an MDA generator

We start with the Platform Independent Model (PIM), created in a UML editing tool, like Poseidon for UML from Gentleware. (The PIM can be in an exported XML format called XMI.) A Platform-Specific Model (PSM) is then created using a transformation. Templates are applied to the PSM to create the output code.

It's easier to understand the difference between a PIM and a PSM in context. The PIM specifies the application business logic, for example, a table named book with these five fields. The PSM is a model of the implementation on a particular platform. In the EJB world, this is the set of UML models of the entity and session beans required to implement the book table.

The separation between the PIM and the PSM creates a well-factored generator that properly separates the design from the implementation specifics.

Some of the more popular MDA solutions are:

The value of MDA is that it is a set of standards that we can agree on and then work to improve. At the moment, though, MDA has some issues. First, the standards are more conceptual than standard, and are subject to interpretation. Second, UML is not complete enough to create business logic or to create efficient SQL schema, so it must be hinted at the PIM level. At the coding level, the current crop of MDA generators use "safe zones" in the code. These are specially marked sections of the code that are preserved between generation cycles. In this way you can extend the code directly to implement business logic that UML cannot specify.


Code generation is another link in the evolutionary chain of increasing abstraction. With it, you will quickly produce higher quality code, and thus be able to respond to changing requirements with ease. This is the true power of modern code generation.

Code Generation Resources

Outside of the direct links to the various generators, there are some more general online resources for code generation:

as well as some good books on generation:

Jack Herrington is an engineer, author and presenter who lives and works in the Bay Area. His mission is to expose his fellow engineers to new technologies. That covers a broad spectrum, from demonstrating programs that write other programs in the book Code Generation in Action. Providing techniques for building customer centered web sites in PHP Hacks. All the way writing a how-to on audio blogging called Podcasting Hacks.

Return to ONJava.com.

Copyright © 2009 O'Reilly Media, Inc.