ONJava.com    
 Published on ONJava.com (http://www.onjava.com/)
 See this if you're having trouble printing code examples


Better, Faster, Lighter Java

Better, Faster, Lighter Programming in .NET and Java

by Justin Gehtland, coauthor of Better, Faster, Lighter Java
07/14/2004

In our new book, Better, Faster, Lighter Java, Bruce Tate and I lay out five basic principles for combating the "bloat" that has built up over time in modern Java programming. That bloat comes in the form of specifications (the J2EE spec, specifically EJBs), implementations (heavyweight containers), and standards (XML, web services). We chose to focus on Java in the book because Java is at a real crossroads right now: with the impending release of the J2EE 3.0 specification, juxtaposed with an explosion of lighter-weight open source frameworks such as Spring, Hibernate, Pico, Kodo, and so forth, the Java world is ripe for an in-depth look at the costs and benefits of the various approaches.

This doesn't mean that our principles aren't equally applicable to .NET. Even though .NET programmers aren't used to thinking in terms of heavy and light containers, enterprise development in .NET is ripe with assumptions about how things are done, and they almost always involve one of two services: COM+ (for EnterpriseServices) and IIS (for web deployment). These two services are, in truth, nothing more than containers provided by Windows. Developing applications that live in these environments is a complex task that grows more complex by the day, and isn't likely to become any less complex with the advent of Longhorn (with its Indigo messaging stack and integrated services for SOA).

In this article, I'll lay out the five principles from our book, and examine how they apply to programmers working on that other major managed platform, .NET. We'll see a lot of the same problems, but not always the same solutions.

Principle 1. Keep it Simple

The primary thrust of this principle is that complexity leads to disaster. Your application should be built around simple constructs and understandable layers, which combine to perform complex tasks. The code itself, however, should avoid complexity at every stage. This is much easier to say than to do, though, since many programmers are afraid of missing important pieces, or of oversimplifying.

You don't need to be afraid of doing something too simply if you embrace change in your application. The ability to fearlessly embrace change is based on good testing practices. If your code has thorough, automated, repeatable tests, then making changes to the code loses much of its anxiety. As changes are introduced, the tests will tell you whether or not you are breaking something important. Automated testing gives you a safety net, allowing you to examine simple solutions first, and to change them over time without fear.

That safety net is provided through a combination of tools, namely:

I do a lot of speaking and teaching in front of .NET developers. I am constantly amazed at the response I get when I mention NUnit. It is invariably something like: "I've heard of that. How do you use it?" For whatever reason, unit testing has not penetrated the .NET development mindset. For my money, NUnit is the best thing to happen to Microsoft developers since Intellisense.

To create unit tests, you have to make a reference to the NUnit.framework assembly. Then, create classes that will hold your tests. Tests are nothing more than specially decorated methods on test classes that contain one or more assertions. For example, imagine you have a class that validates certain kinds of inputs into your system. Such a class might look something like this:

public class Validator
{
    public bool validateCustomerId(string custId)
    {
        // ... validate customer idea, return 
        // ... true or false
    }

    public bool validateSSN(string SSN)
    {
        // ... validate Social Security Number
        // ... return true or false
    }
}

You will want to know that you have programmed these methods correctly. To test your success, create tests that demonstrate both the ability to recognize valid inputs, and to reject invalid ones. Here is an NUnit test class that accomplishes the goal:

[TestFixture]
public class TestValidator
{
    Validator validator;
    private const string GOOD_CUST_ID = "123-456-12345";
    private const string BAD_CUST_ID = "xxx-xxx-xxxxx";
    private const string GOOD_SSN = "111-11-1111";
    private const string BAD_SSN = "11-111-1111";
    // could define as many good and bad consts as needed

    [SetUp]
    public void setUp() 
    {
        validator = new Validator();
    }

    [Test]
    public void testValidCustomerId() 
    {
        assertTrue(validator.validateCustomerId(GOOD_CUST_ID));
        assertFalse(validator.validateCustomerId(BAD_CUST_ID));
    }

    [Test]
    public void testValidSSN() 
    {
        assertTrue(validator.validateSSN(GOOD_SSN));
        assertFalse(validator.validateSSN(BAD_SSN));
    }
}

To run this test, you would point the NUnit test runner of your choice (the graphical WinForms version or the console version) at the compiled assembly where your tests are stored, and run the suite. NUnit will tell you whenever one of your assertions fails or an exception bubbles out. In the graphical viewer, you get a nice red bar to tell you that things aren't well, and a green bar for the "all's clear." If you use NUnit long enough, you develop a Pavlovian response to green and red; red makes your heart pound, green makes you feel all warm and fuzzy.

Related Reading

Better, Faster, Lighter Java
By Justin Gehtland

Creating good unit tests makes you write simple classes that perform their functions with a minimum of fuss. Unit tests act as the first clients for your code, and you quickly get the hang of writing just enough code to make your tests work. Most people who write unit tests often enough end up going all the way and writing their unit tests first; this technique is called Test Driven Development. It can be a tremendously powerful tool for simplicity, since it forces you to think through the public interfaces of your classes before you write them, giving you greater insight into how they will be used, and letting you cut out needless complexity before you even get started.

Once you have unit tests in place, you will want tools that make sure they get run a lot. On your development machine, that tool is usually NAnt (an open source, XML-based build management solution), though make works just as well if you are comfortable with its syntax and you like writing shell scripts instead of XML. NAnt allows you to automate the entire build/test/deploy cycle for your local code efforts.

On top of NAnt and NUnit, other tools worth examining are CruiseControl.NET and/or Draco.NET. These automated integration tools live on a central testing box and monitor your source code repository. As individual developers check in code (which has unit test code accompanying it), the service will check out the latest version of the code, build it, and run the unit tests. This will overcome the "but it works on my machine" problem, as developers will receive near-instantaneous feedback about whether their commits "broke the build." Often, code will pass its unit tests in isolation, but when combined with the rest of the application, will suffer mysterious failures. These automated integration services can demonstrate these errors almost immediately, and the shorter the feedback loop between committing bad code and finding out you did so, the more likely it will be that you can fix it with a minimum of effort.

Between these three tools, you can begin to code using the simplest architectures and abstractions possible, with the confidence that you can refactor to solve the problems that present themselves. That's the key to all good development; only solve the problems you actually have, and leave the needless complexity on the cutting room floor.

Principle 2. Do One Thing, and Do It Well

This second principle is all about understanding the problem you are trying to solve. Whether we talk about the business problem that a given application is meant to address, or the technical problem that a given method, class, or package is addressing, the key to doing it right is to clearly identify the actual problem.

The larger problem is not unique to any development platform. Customers come to you asking for software that solves their problems. More often than not, they have spent a lot of time carefully thinking through the requirements for the software. Unfortunately, far too often, those requirements include a lot of assumptions about what they need. The first job of any good development team is to wade through those assumptions and find out what is driving the customer to your door. Simplify the requirements down to the set of statements that clearly defines the real problem, and you can code with the assurance that you are tackling the right issues.

The smaller problem is one of architecture and design. Much of the code that exists today is entirely un-maintainable, and one of the major reasons for that is over-coupling. We write classes that do a million things, we ship assemblies with a hundred classes in them, we write methods with a thousand lines of code. It is almost impossible, given code written this way, to find the exact place where changes need to be introduced or where bugs are manifesting. Instead of dumping all of that functionality into a single location, it makes much more sense to have a clearly defined separation of concerns, but creating simple classes packaged into targeted layers.

One common example of too much coupling is classic ASP. The problem was that an ASP file contained both layout (static HTML) and logic (script). The business code was sprinkled throughout the page. Making simple changes to the presentation structure or business logic meant editing this mish-mash of code, and hoping that you didn't unintentionally affect the other side of the equation.

Luckily for ASP.NET developers, we were given a very clean way to separate those two concerns using code-behind. With code-behind, we can still create HTML template (.aspx) files, but instead of mixing the code into the file itself, we can have the .aspx page inherit from a class, which is written entirely in the .NET language of your choice. Changes to the presentation layer happen in the .aspx file, and changes to the logic happen in the base class. This is a very good example of our principle in action; write code that solves a single problem. The .aspx pages solve the needs of the UI, and the code-behind files solve the needs of populating those pages with data.

For example, you may continue to write your ASP.NET pages like you have always written ASP pages:

<%@page language="C#"%>

<html>
<script runat="server">
    public string getUserName() 
    {
	   // fetch user's full name and return it
    }
</script>
Welcome, <% = getUserName() %>.
</html>

To simplify and decouple this page, we'll create a code-behind file. The code-behind file contains a class that inherits System.Web.UI.Page:

public class WelcomePage : Page
{
    public string getUserName()
    {
        // fetch user's full name and return it
    }
}

The .aspx file then looks like this:

<%@page language="C#" inherits="WelcomePage"%>

<html>
Welcome, <% = getUserName() %>.
</html>

To complete the transformation, we can use a WebForm control in the .aspx page to display the value, and populate it in the code-behind file at page load, thus eliminating all code from the presentation file.

public class WelcomePage : Page
{
    protected Label lblUserName;

    public string getUserName()
    {
        // fetch user's full name and return it
    }

    protected override void OnLoad(object src, EventArgs e)
    {
        lblUserName.Text = getUserName();
    }
}

and the .aspx file:

<%@page language="C#" inherits="WelcomePage"%>
<html>
Welcome, <asp:label id="lblUserName" runat="server"/>.
</html>

You should repeat this same process whenever you are writing methods and classes or assembling packages. Separate out orthogonal concerns into their own private spaces. This makes it much easier to find the code you are looking for, whether that be to change or add to it, or to track down a bug. In the example above, if there is something wrong with how the page is displayed to the user, I know to go to the .aspx file, and it is easy to see the intent of the page. Likewise, if the data is coming out garbled, I look at WelcomePage.cs, which is uncluttered by presentation and layout.

Principle 3. Strive for Transparency

The heart of any enterprise application is a domain model that represents the actual business problem that your customers face. It is this domain model that provides the actual value to your customers; it is what is unique about their business and your software. Everything else is just a tool in service to the business model. Things like transactional support, persistence, messaging, security -- all of those things are generic services that exist in many applications.

In order to keep the business model light, simple, and easy to maintain, you want to keep it as uncluttered with ancillary code as possible. The services provided to the domain model should be transparent, meaning that you shouldn't have to add anything to your domain objects to get the benefit of those services. At the very least, you shouldn't have to add anything to them that you can't safely ignore when you need to, such as at unit-test time.

Take, for example, an invoice processing system. When a user wants to edit an existing invoice, they will generally choose that invoice, modify some fields on a form (Web- or Win-) and submit the changes. A very non-transparent web application might look like this (showing only the code-behind, not the .aspx file):

public class EditInvoice : Page 
{
    // series of fields with invoice data
    protected TextBox invoiceNumber;
    protected TextBox invoiceAmount;
    protected DropDown invoiceCurrency;
    // etc.

    public btnSubmit_OnClick(object sender, EventArgs e)
    {
        try
        {
            // fetch invoice data from database
            SqlConnection conn = new SqlConnection(CONNSTRING);
            SqlCommand comm = new SqlCommand("select * from 
invoice where invoiceNumber = " + invoiceNumber.Text, conn);
            SqlDataAdapter da = new SqlDataAdapter(comm);
            DataSet ds = new DataSet();
            conn.open();
            da.fill(ds);
        } catch (Exception ex) {
            // log the exception 
        } finally {
            conn.close();
        }

        // perform business logic
        Double amount = Convert.ToDouble(invoiceAmount.Text);
        if(amount > 10000) throw new InvoiceAmountTooHighException();

        // edit the data
        DataTable t = ds.Tables[0];
        DataRow r = t.rows[0];
        // etc.

        // update the database
        try 
        {	
            conn.open();
            da.update(ds);
        } catch (Exception ex2) {
            // log exception 
        } finally {
            conn.close();
        }
    }
}

This is the worst kind of code imaginable. Your domain model (the invoice itself and the business rules applying to it) is embedded in a file that is responsible for displaying the data, as well as for persisting it. When things go wrong, or the model needs to change, it will be extremely difficult to find exactly where to go, and changes are likely to have ripple effects all over your code. Not to mention that unit testing something like this is a nightmare: you have to mock up a browser, and trace all the way from display to the database to see if your business logic (trapping for amounts < 10000) is working correctly.

A better strategy would be to create an official domain object for your business code, and separate everything else into transparent layers around it. We'll start with a new Invoice class:

public class Invoice
{
    // private data 
    private long invoiceNumber;
    private double invoiceAmount;
    private Currency invoiceCurrency;
    // etc.  

    // public properties
    public double Amount
    {
        get { return invoiceAmount; }
        set 	
        {
            if (value > 10000)
            throw new InvoiceAmountTooHighException();
            invoiceAmount = value;
        }
    }
    // etc.
}

This domain class doesn't know anything about how it will be displayed to the user or how it will be persisted to the database. It is entirely focused on representing the code business abstraction invoice, and the rules that apply to its data. Nothing else clutters the code.

We will separate out the other two portions of our example, the view and the persistence, into their own classes. We'll make a class called PersistenceManager, which we'll come back to in a minute, but for now, assume it has a method called saveInvoice that takes a single instance of Invoice as a parameter.

The display class can then focus on managing the display of information from the invoice:

public class EditInvoice : Page 
{
    private Invoice invoice;
    private PersistenceManager persistenceManager;

    // series of fields with invoice data
    protected TextBox invoiceNumber;
    protected TextBox invoiceAmount;
    protected DropDown invoiceCurrency;
    // etc.

    public btnSubmit_OnClick(object sender, EventArgs e)
    {
        try 
        {
            invoice.setAmount(
            Convert.ToDouble(invoiceAmount.Text));
        } catch (Exception ex) {
            // notify user of exception
        }

        try 
        {
            persistenceManager.saveInvoice(invoice);
        } catch (Exception ex) {
            // notify user of data exception
        }
    }
}

Now, the presentation layer is totally transparent to your business model. If your business model needs to change, there is a single codebase to look at, and not a lot of extraneous code to get in the way. Changes to the model do not necessarily need to affect the presentation layer, and vice versa.

The persistence layer is the same way. By separating out your persistence logic into another class (PersistenceManager) you abstract away the common, non-business- specific functionality of interacting with a database to a transparent part of the code. To implement PersistenceManager, you have a variety of choices. You can implement the transactional script pattern, which is essentially what we did in the first page above. That is to say, you use the native data interaction objects provided by .NET (DataSet, SqlDataAdapter, etc.) to manually pull values out of the Invoice instance and modify the database, manually controlling any transactional semantics yourself.

On the other hand, you could use an object/relational (O/R) mapping framework to handle the work for you. O/R mappers are frameworks that, given a mapping file that relates class definitions with database schema, can auto-generate the SQL scripts needed to perform persistence activities. These kinds of frameworks simplify your code greatly by eliminating all of the hard-coded SQL scripts used by the transactional script pattern, and free you to work in terms of your domain model only. Currently, the major O/R mapping framework available to .NET developers is NHibernate, a .NET port of the popular Java Hibernate framework. NHibernate is an open source project found at nhibernate.sourceforge.net. Microsoft has its own O/R mapping technology called ObjectSpaces that is going to be released with Longhorn and/or Whidbey, but unless you are a beta tester for Microsoft, you can only get NHibernate at present.

NHibernate makes the persistence effort totally transparent to your code. You start by providing mapping entries for your domain model. You create one mapping file per persistent class, name it classname.hbm.xml, and put it next to your .cs file. The mapping file for Invoice (invoice.hbm.xml) might look like this:

<?xml version="1.0" encoding="utf-8" ?> 

<hibernate-mapping xmlns="urn:nhibernate-mapping-2.0">

    <class name="com.relevance.transparency.Invoice, 
com.relevance.transparency" table="invoices">
                
        <id name="invoiceId" column="ID" type="int">
            <generator class="assigned" />
        </id>
                
        <property name="Amount" column="amount" type="double"/>
        <property name="Currency" type="char(3)"/>
        <!-- etc. -->
    </class>
        
</hibernate-mapping>

When mapping properties to columns, be sure to use the name of the public property, not the private field. If the property isn't publicly accessible, NHibernate won't be able to access the values.

Finally, you need to configure NHibernate itself so it can find the datasource used for persisting all these classes. You can add that configuration information to your application's .config file, like so:

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
    <configSections>
        <section name="nhibernate" 
type="System.Configuration.NameValueSectionHandler, System, 
Version=1.0.3300.0,Culture=neutral, PublicKeyToken=b77a5c561934e089" />
    </configSections>
        
    <nhibernate>
        <add 
          key="hibernate.connection.provider"          
          value="NHibernate.Connection.DriverConnectionProvider" 
          />
        <add 
          key="hibernate.dialect"                      
          value="NHibernate.Dialect.MsSql2000Dialect" 
          />
        <add 
          key="hibernate.connection.driver_class"          
          value="NHibernate.Driver.SqlClientDriver" 
          />
        <add 
          key="hibernate.connection.connection_string" 
          value="Server=dataserver;initial 		 
          catalog=transparency;User ID=webuser;Password=webpwd" 
          />
    </nhibernate>
</configuration>

The last step is to implement the saveInvoice method of PersistenceManager. The PersistenceManager class must reference the nhibernate assembly. Then, the class looks like this:

public class PersistenceManager
{
    private Configuration config;
    private IsessionFactory factory;

    public PersistenceManager()
    {
        Configuration config = new Configuration();
        config.addAssembly("com.relevance.transparency");
        IsessionFactory factory = config.BuildSessionFactory();
    }

    public void saveInvoice(Invoice invoice)
    {
        ISession session = factory.OpenSession();
        ITransaction tx = session.BeginTransaction();
        session.save(invoice);
        tx.Commit();
        session.Close();
    }
}

Furthermore, adding a Load method to the PersistenceManager is just as easy.

public Invoice loadInvoice(string invoiceId)
{
    ISession session = factory.OpenSession();
    Invoice result = (Invoice)session.Load(typeof(Invoice),invoiceId);
    session.Close();
    return result;
}

By employing an O/R mapper, you can make persistence totally transparent to the business model and can take the extra step of simplifying your code by eliminating hand- coded SQL scripts. The drawback to a scheme like this is that it is next to impossible to tune the SQL scripts, since they are auto-generated. If your application is dependent on highly customized and optimized SQL scripts, then you will want to either stick with the transactional script pattern, or use a modified NHibernate implementation where you pass the queries in manually using HSQL dialect (which is outside of the scope of this article).

Principle 4. You Are What You Eat

Anyone who has heard me talk about enterprise development has heard me say, at least once, that the tools you use aren't just tools, they're risks. Every time you use somebody's framework, or IDE, or library, or interface, or what-have-you, you are essentially investing in that team/person/company. If that tool suddenly goes away, or takes a sharp right turn when you were hoping for a left, then your project might suffer as a result. Additionally, when you ship applications to your customers, if something goes wrong (even deep down in some third-party library you got from the Internet), your customer will come back after you, not the third party. Your application is always your responsibility; if you make use of a buggy framework, those bugs will reflect back on you.

In the Java world, there are a hundred ways to do anything. If you want an MVC framework, there's Struts, WebWork, SpringMVC, and one implementation for every J2EE vendor. For O/R mapping, there's Hibernate, JDO, Kodo, EJBs, etc. Heck, there are even five or six viable IDEs. On the Microsoft side of the world, we have tended to be less diverse. We use Visual Studio for development, ADO.NET for data access, and ASP.NET for our web front end. There aren't nineteen different web form strategies (Tapestry, Velocity, JavaServer Faces, etc.). However, that focus on a single vendor is starting to change.

If you take a look at SourceForge, you will notice that there are almost as many .NET projects as Java projects now. All of the major tools on the Java side are being ported to .NET. This list includes:

And the list goes on. As the world of alternatives begins to open, Microsoft developers will now have to face the decisions that Java programmers face all the time: which implementation is right for this team, and this project? Before, you just used what was available. Now, there are competing strategies at every turn.

Making the right decision about your tools always comes down to a cost/benefit analysis. The cost side of the equation is some amalgam of money, research time, training, performance, and sustainability ("Will this product be around for as long as my project?"). The benefits can take the form of simplified development and maintenance, better performance or scalability, or simplification of the code. Tool choices should never be based on either of the following propositions:

Microsoft may or may not be right about whether a given tool is the right fit, and Joe may or may not be a good arbiter of what's hip. You should think carefully and make your own decisions about whether a certain tool brings you the most benefit for the costs, and implement accordingly.

Principle 5. Allow for Extension

It is an axiom of distributed programming that no matter how small your user base is when you start, if you write a good enough application, eventually you will have to scale up. Therefore, distributed applications must always be architected for scalability in the face of future need. Likewise, any application written well enough will someday be put to uses its authors never intended or imagined. Therefore, applications should be written in such a way as to allow for extension.

In general, this means writing code that allows for the unknown. This sounds like an impossible proposition, but in reality, the key factor is understanding where future users are likely to want to change the way your application works, and making those areas as loosely coupled to the rest of the application as possible.

Let's examine the notion of loose coupling first. Many developers think of loose coupling as using interfaces. That is, instead of:

Document document = new Document();

having:

IDocument document = new Document();

The code is now bound to an abstract interface called IDocument instead of the concrete Document class. However, that call to new Document() is still tightly coupled. Changing it later is impossible without recompiling, which defeats the whole purpose of loose coupling. Instead, what is needed is a factory method. This is usually a static method on the class itself that can return new instances of a given type.

IDocument document = Document.CreateInstance("some_document_type",
    "some_assembly");

The beauty of factory methods is that they can return any concrete type that implements IDocument, not just the original Document class, and they can look up the actual implementation at runtime using reflection. The implementation of CreateInstance might look like:

public static IDocument CreateInstance(string type, string asm)
{
    Assembly assembly = Assembly.Load(asm);
    return (Idocument)assembly.CreateInstance(type);
}

As long as the calling code has passed in an accessible assembly, and the type they asked for implements the IDocument interface, then it doesn't matter what the concrete type is, or even if it existed when the application was first written.

To allow future users to add or replace types this way, usually you will add a section to your application's .config file that contains references to the appropriate assemblies.

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
    <configSections>
        <section name="documents" 
type="System.Configuration.NameValueSectionHandler, System, 
Version=1.0.3300.0,Culture=neutral, PublicKeyToken=b77a5c561934e089" />
    </configSections>
        
    <documents>
        <add
	      key="PersonalLetter"
	      value="com.relevance.PersonalPapers,PersonalLetter"
	      />
        <add
	      key="Memoir"
	      value="com.relevance.PersonalPapers,Memoir"
	      />                
	    <add
	      key="Memo"
	      value="com.relevance.BizDocs,Memo"
	      />
    </documents>
</configuration>

When allowing your users to choose a type of document to interact with, you simply grab the list of configured document types from the .config file and display them in some sort of selection interface. Once the user makes a decision, your factory method can load the assembly and type and return the appropriate implementation.

A special note about reflection: it gets a bum rap. Reflection is slower than direct coupling, but usually measured at about two or three times slower. That means you shouldn't do everything via reflection, but its judicious use for something like loading a dynamically assigned class at runtime (but binding it statically to a known interface) is perfectly reasonable, especially if you are talking about a modern distributed application, where the two more important criteria for boosting performance are limiting round trips on the network and minimizing hits to the database. If you can solve those problems, then maybe you can think about how a little reflection affects performance. I find that extensibility is usually a more important factor for me.

Conclusion

As you can see, there is plenty of needless complexity piling up all over the development landscape, and one of the principal tasks of any programmer is recognizing the bloat for what it is, and avoiding it where possible. .NET is no more immune to this problem than Java is. For that matter, as my good friend Ted Neward points out in his blog, ".NET is Microsoft's solution to the bloat build-up in COM." If we know anything about the technology industry, it's that history repeats itself. Programmers need to take it upon themselves to limit the bloat, and prune the complexity that is keeping their applications from living the good life. I hope these five principles give you a starting point for examining the choices and assumptions you have made about your projects, and give you some ideas of ways to make your programming life more simple and fun again.

Justin Gehtland is a programmer, author, mentor and instructor, focusing on real-world software applications.


In June 2004, O'Reilly Media, Inc., released Better, Faster, Lighter Java.


Return to ONJava.com.

Copyright © 2009 O'Reilly Media, Inc.