Code-Generation Techniques for Javaby Jack Herrington
Working in Java either means writing a little bit of complex code or writing a lot of gruntwork code. J2EE is a prime example; implementing the persistence for a single database table takes five classes and two interfaces using EJBs, and almost all of the classes are clerical work. We have to write them, but we don't have to do it by hand. Code-generation techniques can make building high-quality EJB code a breeze.
Will code generation revolutionize computing and change the way we develop forever? Yes, but it will take a while. Software engineering has always concentrated on increasing our level of abstraction. In the beginning, we hand-wrote machine code; then we created assemblers and macro assemblers. After that, we created Fortran and compiled our code into assembler. Then came structure programming, and after that, object-oriented programming. With each step, we have increased our level of abstraction and, thus, our ability to create higher quality applications with more functionality, more quickly.
What is Code Generation?
What is this panacea for developers called code generation? Code
generation is the technique of writing and using programs that build
application and system code. To understand code generation, you need to
understand what goes in and what comes out. What goes in is the design for the
code in a declarative form: "I need two tables named
author with these fields." What comes out is one or more target
files. It could be Java code, deployment descriptors, SQL, documentation, or
any type of controlled output.
Figure 1 shows the basic form of today's code generators:
Figure 1. The process of code generation
The components can change slightly between the different models, but the song remains the same. The code generator reads in the design, then uses a set of templates to build output code that implements the design. The separation between code generation logic in the generator and output formatting in the templates is akin to the separation between business logic and user interfaces in web applications.
Code generators are not wizards. Wizards are passive generators. They write code once, and then it's up to you to maintain the code forever. Code generators are active. They continually maintain code over multiple generation cycles. As the designs change, the input to the generator changes, and new code is created to match the design. This is a key advantage — when have you been on a project where the requirements don't change?
What Are the Benefits?
Before we get into specific examples of code generators for Java, let's make sure we have the end goals firmly in mind. One way to approach this is to think about the qualities we want in an optimal generator.
- Quality: We want the output code to be at least as good as what we would have written by hand. Thankfully, the template-based approach of today's generators builds code that is easy to read and debug. Because of the active nature of the generator, bugs found in the output code can be fixed in the template. Code can then be re-generated to fix that bug across the board.
- Consistency: The code should use consistent class, method, and argument names. This is also an area where generators excel because, after all, this is a program writing your code.
- Productivity: It should faster to generate the code than to write it by hand. This is the first benefit that most people think of when it comes to generation. Strangely, you may not achieve this on the first generation cycle. Thankfully, the real productivity value comes later, as you re-generate the code base to match changing requirements; at this point you will blow the hand-coding process out of the water in terms of productivity.
- Abstraction: We should be able to specify the design in an abstract form, free of implementation details. That way we can re-target the generator at a later date if we want to move to another technology platform.
Now that we understand that benefits that we want, and how those are addressed by code generation techniques in general, we should understand what we expect to use code generation for in the Java context.
What We Expect the Generator to Handle
The output files of a generator are called the target files. There are several generation targets within the Java enterprise application stack. Figure 2 shows the stack:
Figure 2. J2EE generation targets
All four of these elements of the stack are potential generation targets, but some are more common than others. From the bottom to the top:
- Database: Given Java's object-persistence approaches to database work, there isn't much call for direct generation of SQL for database code or stored procedures. However, if this is your architecture, you can use the custom approaches listed below to generate the required code.
- Persistence: Database persistence code is the most common generation target in the Java environment. All of the generators I refer to in the sections that follow build persistence code. Why? It's generally redundant grunt code. Generated database-persistence code also is an excellent foundation for a solid application, because it is consistent and relatively bug-free.
- Business Logic and User Interfaces: Only MDA and custom generators build production business logic and user interfaces. The critical factor in generating this code is building on top of a stable, predictable platform, ideally a generated persistence layer.
It's obvious that code generation is powerful and can build useful code, but does it have drawbacks?
What to Look Out For
Code generation is not without pitfalls and detractors. One of the most common complaints is that code that was once active is now being hand-modified and thus cannot be re-generated. One trick is never to check the generated source into the code base. This ensures that engineers will always be required to use the generator as part of the compilation process. This keeps the generator alive and keeps engineers from modifying the output code.
Another problem is that engineers who have been around for around since the early 90s liken code generators to Computer-Aided Software Engineering (CASE) tools. The comparison is mistaken because code generators are developed bottom-up by engineers for engineers. CASE tools were developed as a top-down replacement for programming languages and for engineers.
There are more reasons that engineers are skeptical about generation. Some issues are technical and others are cultural. Some times it comes down to simple job preservation. These tend to be situation-specific and boil down to simple issues: trust, teamwork, and education. In order to successfully deploy a generator, the team must trust the tool. They must feel that they have some control over the tool and its implementation. They also need to know how the tool is used both at a basic level (e.g., How do I run it?), and at a specific level (e.g., How do I specify when I need a table with a compound primary key?).
Perhaps the biggest drawback of code generation is that it falls to the implementer of the tool to ensure successful adoption within the team. If you put a copy of the code generator on the server and expect that people will immediately understand its use and the compelling value, then you are sure to fail. Education and empathy are key.
Given an understanding of which Java application components we can generate and what we have to look out for, let's talk about the generators that build them.