ONJava.com    
 Published on ONJava.com (http://www.onjava.com/)
 See this if you're having trouble printing code examples


Lisp and Java

by Dan Milstein
03/24/2004

Why learn a new programming language? Among other excellent reasons (such as good, old-fashioned intellectual curiosity), there's the opportunity to pick up useful techniques, tricks, and idioms that you can apply in your day-to-day programming life. At its best, studying a new language can give you the kind of conceptual shift that illuminates thorny problems in a new light. Even if your mainstream language of choice doesn't provide the special-purpose syntax that you find in a language you're exploring, you can often find a way to implement the underlying technique in a useful manner.

In this article, we're going to steal an idea from one of the most theft-worthy languages out there: Lisp. We're going to pick out one of its most useful features -- the ability to treat functions as data -- and talk about how to apply this feature, in a slightly different form, in Java. In the course of doing so, we'll give a very (very) brief introduction on how to read Lisp code. We'll also develop a small but useful library for JDBC and collections programming that you are welcome to use, abuse, and extend as you see fit. We're going to use the Scheme dialect of Lisp for our discussion, because it expresses the ideas we're interested in in a particularly clear and elegant way.

Lisp Code and How To Read It

(this (is what)
  (lisp code
    (looks)
    (like (more (or less)))))

A casual observer of Lisp code will notice the dazzling collection of parentheses, without a heck of a lot else to visually break up the code. Where in a language with C-descended syntax you get parentheses, curly braces, commas, colons, semicolons, and a whole horde of other syntactical signposts to keep you on your way, in Lisp, you get pretty much nothing but ( and its friend ). All other tokens are separated by white space, and indentation, though extremely important culturally, has no significance as part of the language itself.

For example, in Lisp, if you want to, say, call a function that doubles a number, you write:

(double 7)

Side note: when discussing Lisp, it's common to show the result of evaluating an expression as follows:

(double 7)
=> 14

Which can be read as "(double 7) evaluates to 14."

In Java, there are basically two ways you can call a function: as an instance method or as a class method. For the former, you'd have something like:

aNumberSeven.double()

For the latter:

aClass.double(7)

For both of these cases in Java, you learn to see the word just before the opening parenthesis as "hot" -- it's the verb of the phrase you're reading. One nice thing about Java's instance method-call syntax is that makes it very clear what the subject of that verb is (by matching the English language's subject-verb-object order). In Lisp, the word immediately after the opening parenthesis is what you learn to read as the verb (in most cases), and there is no distinguished subject of that verb. This pattern is used for functions that are infix operators in other languages, such as plus and minus:

(+ 5 2)
=> 7

Or assignment (which has an undefined value, so we won't show it evaluating to anything):

(define a 11)

which is more or less equivalent to:

int a = 11;

Note that Lisp doesn't specify the type of variable a -- in Lisp, a variable can hold any type.

The intense regularity to the syntax can be forbidding. Parenthetical digression: the payoff comes through the ability to represent Lisp programs as Lisp data, which means that Lisp programmers can easily write programs to manipulate other programs. Although that is indeed a trick worth stealing or even spending a lifetime studying, it would take more of a book than an article to explore it fully. For such a book, check out Paul Graham's ANSI Common Lisp or On Lisp (available as a PDF on his web site), which contain many inspiring uses of the mighty Lisp macro. End parenthetical digression.

Moving on through the rudiments of the language: the basic data structure in Lisp is the list, which can be created in a variety of ways, one of the simplest of which is via the list function.

(define lst (list 2 3 5 7 11))

lst => (2 3 5 7 11)

Treating Functions as Data In Lisp

Okay, now that we've got basic Lisp syntax and Lisp data structures under our belts, we're ready to talk about the idea we're going to steal: in Lisp, functions are first-class objects, meaning they have all the "rights and privileges" accorded to other objects in Lisp. You can name them with variables, pass them into functions, return them from functions, and store them in data structures. They are no different from "basic" data. This facility turns out to be enormously powerful.

As a simple example, you can define a function that takes another function as an argument. Such a super function is known as a higher-order function. If this is a new concept, the key thing to understand is that you can, in Lisp, refer to a function without calling it, much as you can, in Java, refer to a variable without immediately using its value. In fact, in the Scheme dialect, the syntax for function and variable reference is identical. To show this, we'll introduce our first big higher-order function, map. The map function takes another function and a list as arguments and creates a new list by applying the function to each element of the original list, like so:

(define lst (list 2 3 5 7 11))

(map double lst)
=> (4 6 10 14 22)

In Java syntax, that'd look something like:

int[] lst = [ 2,3,5,7,11 ];
int double(int x) { return x + x; }
int[] dbls = map(double, lst);

Which looks great, but is, unfortunately, nonsense. In Java, you can't pick up a method and pass it into another method. There is no simple way to refer to the method at runtime -- you can only invoke it. You can't easily pass it to another function, return it from a function, or store it in a data structure. Through complex reflection tricks, these things can be done, but it's not for the faint of heart. In Lisp, you can do all of these things quite simply, and, in Lisp culture, you do them all the time.

Why First-Class Functions Are Useful

One place where first-class functions come in particularly handy is in dealing with collections. It's very common to have a collection of stuff and to want to produce another collection of stuff. You'd like to ignore the nitty-gritty of the looping code and just keep your mind on the collection as a whole. A higher-order function such as map lets you do just that.

An example: if you've done any work with aggregate data in Java, you've gotten used to seeing your ideas disappear under a haze of iterator code. In particular, one type of aggregate data that is tough to handle cleanly is the JDBC ResultSet. In my day-to-day life, I write a lot of Java code that pulls data out of a database and then displays it on a web page. For all of you JDBC fans, the following type of code probably looks all too familiar:

String query = "SELECT first_name, last_name, user_id " +
               "FROM users";
Statement stmt = conn.createStatement();
ResultSet rs = stmt.executeQuery(query);
ArrayList users = new ArrayList();
while(rs.next()) {
  String fname = rs.getString("first_name");
  String lname = rs.getString("last_name");
  String uid = rs.getString("user_id");
  users.add(new User(fname, lname, uid));
}

To a Lisp programmer, it would be natural to break down the above chunk of code into something that generates a list of rows, and a function that is then mapped over each row. How can we do something similar in Java?

Approximating First-Class Functions in Java

The only parts of our JDBC example that would change if we were collecting a different type of object would be:
  1. The connection, which was obtained outside of the block shown.
  2. The string used in the query.
  3. The body of the while loop.

The connection and the string are easy, since we can pass them in to a method that captures the pattern. The body of the while loop is tougher, because it's well, a chunk of code, which is hard to manipulate in Java. In general, if you want to pass around code in Java, you have to create an instance of a class. Thanks to interfaces and anonymous classes, we can create an anonymous class just for the code in question.

To start our translation of the Lisp map idiom into Java, we define the following interface:

interface RowFunc {
  public Object of(ResultSet rs) throws SQLException;
}

An object that implements RowFunc is sort of like a function in Lisp, in that it can be called (by invoking the of method), passed to another function, returned from a function, and stored in a data structure. Of course, we have to explicitly apply it, but it's a start. We call RowFunc's method of so that you can read an invocation of a RowFunc named f as f "of" whatever it's applied to. I have a mathematical background (and a possibly too-great love of concision). In any event, here's how we use the interface to abstract our "collect all of the objects in the ResultSet" pattern:

static List rsMapQuery(Connection conn,
                       RowFunc f,
                       String query) throws SQLException {
  Statement stmt = conn.createStatement();
  ResultSet rs = stmt.executeQuery(query);
  ArrayList result = new ArrayList();
  while(rs.next()) {
    result.add(f.of(rs));
  }
  return result;  
}

You'll note that we don't first pull all of the results into a List and then map over that. Although that would be a perfectly valid way to work, JDBC ResultSets are common enough (and there's enough interesting ResultSetMetaData in them), that it seemed worthwhile to have a special purpose method to map from a ResultSet to a List. Later in the article, we develop a nice way to go from a ResultSet to a List of HashMaps, which is about as general as you can get when dealing with unknown data.

With these general definitions in place, we can now rewrite our earlier specific query using an anonymous inner class as:

RowFunc makeUser = new RowFunc() {
    public Object of(ResultSet rs) throws SQLException {
      String fname = rs.getString("first_name");
      String lname = rs.getString("last_name");
      String uid = rs.getString("user_id");
      return new User(fname, lname, uid);
    }
  };

List users =
  rsMapQuery(conn,
             makeUser,
             "SELECT first_name, last_name, user_id " +
             "FROM users");

Now, in this one case, we haven't saved much in the way of typing, though it is certainly nice that the JDBC boilerplate has disappeared from our code. However, if we need to create a list of users again somewhere else, we can just pass in makeUser again:

List newUsers =
  rsMapQuery(conn,
             makeUser,
             "SELECT first_name, last_name, user_id " +
             "FROM users WHERE type = 'new' ");

If "create a User from a row of a ResultSet" turns out to be a useful concept, having abstracted it into a function that can be easily manipulated will prove its value over time. Speaking of easy manipulation, wouldn't it be nice if we stored the makeUser function with the User class? As mentioned above, we can store these function-like objects in data structures:

class User {

  public static RowFunc make = new RowFunc() {
    public Object of(ResultSet rs) throws SQLException {
      return new User(rs);
    }
  };

  public User(ResultSet rs) throws SQLException {
    this.fname = rs.getString("first_name");
    this.lname = rs.getString("last_name");
    this.uid = rs.getString("user_id");
  }
}

Note that before, User had a constructor that took three params, whereas now we're just directly passing in the ResultSet. This eliminates the possibility of confusing the order of the args. Now our call looks like this:

List users =
  rsMapQuery(conn,
             User.make,
             "SELECT first_name, last_name, user_id " +
             "FROM users");

Let's keep improving things. Every class we're going to fit into this paradigm is going to have a nearly identical make static method. That seems a bit wasteful. What if we could write a function that returned a function to do it for us? We can do that, using a class constructor and a bit of reflection:

public class RFMaker implements RowFunc {
  private Constructor _cons;

  public RFMaker(Class c) {
    try {
      _cons =
        c.getConstructor(new Class[] { ResultSet.class });
    }
    catch(NoSuchMethodException e) {
      throw new IllegalArgumentException(
        "RFMaker must be called with a class " +
        "which has a ResultSet constructor");
    }
  }

  public Object of(ResultSet rs) throws SQLException {
    try {
      return _cons.newInstance(new Object[] { rs });
    }
    catch(InstantiationException ie) {
      throw new RuntimeException(
        "RFMaker failed due to: " + ie);
    }
    catch(IllegalAccessException iae) {
      throw new RuntimeException(
        "RFMaker failed due to: " + iae);
    }
    catch(InvocationTargetException ite) {
      try {
        SQLException e =
         (SQLException) ite.getTargetException();     
        throw e;
      }
      catch(ClassCastException cce) {
        throw new RuntimeException(
          "RFMaker failed due to: " + ite);
      }
    }
  }
}

The complexity of using reflection here is a clear demonstration of why it's too tricky to use in day-to-day programming. As part of an infrastructure (which is how we can think of RFMaker), it's worth the effort, but the above multi-exception monstrosity should never appear in normal code. It's not clear exactly how to best handle the host of checked exceptions. We opt for a fail-fast runtime exception, which seems reasonable.

With that piece of hairy code completed, we can now write:
List users =
  rsMapQuery(conn,
             new RFMaker(User.class),
             "SELECT first_name, last_name, user_id " +
             "FROM users");

User and any other classes that are going to be loaded from the database need only define a constructor that takes a ResultSet as a param. At that point, you might want to put the SQL strings into the classes as well, so that all of the information about how to load, for instance, a User from the database is stored in one place.

Extending the First-Class Function Approach

"Why not use an object/relational bridge?" we hear you say. Yes, yes, yes, you could use one of the innumerable object/relational bridges out there, possibly specifying the table-to-class mapping via a set of XML files (and can we just say, ugh). Yes, it's possible that that would make some of the simple queries above disappear altogether. However, the really nice thing about the first-class function approach is its flexibility: for example, it can be easily extended to handle more complex queries (as we saw above for the newUsers), or even new kinds of loops.

Java Examples in a Nutshell

Related Reading

Java Examples in a Nutshell
By David Flanagan

To make this concrete, let's say you need to use a prepared statement for your SQL query. If you're still building Users, you can write a new mapping function, and use the same RFMaker constructor. This is where the power of having abstracted the User creation code away from the looping code really starts to shine. Et voila:

static List rsMapPrepared(Connection conn,
                          RowFunc f,
                          String query,
                          String[] vals)
  throws SQLException {

  PreparedStatement pStmt = conn.prepareStatement(query);
  for(int i = 0 ; i < vals.length ; i++) {
    pStmt.setString(i+1, vals[i]);  // JDBC is 1-indexed.
  }
  ResultSet rs = pStmt.executeQuery();
  ArrayList result = new ArrayList();
  while(rs.next()) {
    result.add(f.of(rs));
  }
  return result;  
}

List users =
  rsMapPrepared(conn,
                new RFMaker(User.class),
                "SELECT first_name, last_name, user_id " +
                "FROM users WHERE last_name = ?",
                new String[] { "Riemann" });

Or, you could decide that you'd like to create HashMaps from each row in your ResultSet, rather than building instances of a specific class. This can be a very nice approach, if all you're going to do is display the resulting list via a template system such as Velocity or Freemarker. Taking advantage of some ResultSetMetaData, we can write the following RowFunc:

static RowFunc makeHash = new RowFunc() {
    public Object of(ResultSet rs) throws SQLException {
      HashMap hm = new HashMap();
      ResultSetMetaData rsmd = rs.getMetaData();

      int columnCount = rsmd.getColumnCount();
      for(int i = 0 ; i < columnCount ; i++) {
        // Note that JDBC columns are 1-indexed
        String column = rsmd.getColumnName(i+1);
        hm.put(column, rs.getString(column));
      }
      return hm;
    }
  };

And now we can write any SQL queries we like and obtain a List of HashMaps from the result:

List usersAndDepts =
  rsMapQuery(conn,
             makeHash,
             "SELECT first_name, last_name, user_id, " +
             "department_name FROM users, departments" +
             "WHERE users.dept_id = departments.dept_id");

Note: The above code suffers from some inefficiency because it rereads all of the column names for each row. If this proved to be a problem, it would be fairly easy to create a variant of makeHash that was initialized with an array of column names once per ResultSet.

We've just scratched the surface of what you can do with first-class functions. As we confront problems in our programming practices, we should always be striving to write code that states what it does as clearly and simply as possible. The first-class function approach, by giving you new tools to capture patterns in your code, can lead to clear and elegant solutions to a wide variety of problems. The new abstractions that you can build will often let you step back and see your overall program in a new light. And that's something always worth learning.

Resources

Lisp has a long and complex history, with enough brilliant innovations, bitter rivalries, and failed startups to keep a team of historians in tenure. For some details on where the bodies are buried, check out Richard Gabriel and Guy Steele's excellent paper, "The Evolution of Lisp" (PDF).

Dan Milstein is an independent programmer and consultant in the Boston area.


Return to ONJava.com.

Copyright © 2009 O'Reilly Media, Inc.