ONJava.com -- The Independent Source for Enterprise Java
oreilly.comSafari Books Online.Conferences.

advertisement

AddThis Social Bookmark Button

Memoization in Java Using Dynamic Proxy Classes
Pages: 1, 2

Creating a General-Purpose Caching Wrapper Using Dynamic Proxy Classes

The only remaining disadvantage of using caching wrappers is their non-reusability. Every time I wish to add caching to a class, I need to write a specialized caching wrapper for the target interface. This can be slow and prone to error.



Version 1.3 of the Java™ 2 SDK, Standard Edition, introduced support for dynamic proxy classes: special classes that can choose which interfaces they implement when created at runtime. (Contrast this to normal Java classes, whose implemented interfaces are known -- and fixed -- at compile time.) By using this feature, it is possible to write a generic caching wrapper class, called Memoizer, that determines at runtime which interface it implements. There is no longer any need for the CachingBinaryDigitsCalculator wrapper; calls such as this one:

BinaryDigitsCalculator calculator =
  new CachingBinaryDigitsCalculator(
    new PiBinaryDigitsCalculator()
  );

can be rewritten to use Memoizer as follows:

BinaryDigitsCalculator calculator =
  (BinaryDigitsCalculator) Memoizer.memoize(
    new PiBinaryDigitsCalculator()
  );

The listing for Memoizer appears below:

import java.lang.reflect.InvocationHandler;
import java.lang.reflect.InvocationTargetException;
import java.lang.reflect.Method;
import java.lang.reflect.Proxy;
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class Memoizer implements InvocationHandler {
  public static Object memoize(Object object) {
      return Proxy.newProxyInstance(
        object.getClass().getClassLoader(),
        object.getClass().getInterfaces(),
        new Memoizer(object)
      );
  }

  private Object object;
  private Map caches = new HashMap();

  private Memoizer(Object object) {
      this.object = object;
  }

  public Object invoke(Object proxy, Method method,
  Object[] args) throws Throwable {

      if (method.getReturnType().equals(Void.TYPE)) {
          // Don't cache void methods
          return invoke(method, args);
      } else {
          Map cache    = getCache(method);
          List key     = Arrays.asList(args);
          Object value = cache.get(key);

          if (value == null && !cache.containsKey(key)) {
              value = invoke(method, args);
              cache.put(key, value);
          }
          return value;
      }
  }

  private Object invoke(Method method, Object[] args)
  throws Throwable {
      try {
          return method.invoke(object, args);
      } catch (InvocationTargetException e) {
          throw e.getTargetException();
      }
  }

  private synchronized Map getCache(Method m) {
      Map cache = (Map) caches.get(m);
      if (cache == null) {
          cache = Collections.synchronizedMap(
            new HashMap()
          );
          caches.put(m, cache);
      }
      return cache;
  }
}

A call to the static method memoize with an object creates a new proxy instance -- that is, an instance of java.lang.reflect.Proxy -- that implements the same set of interfaces as the passed-in object. The set of interfaces is determined by object.getClass().getInterfaces(). Each proxy instance has an invocation handler (java.lang.reflect.InvocationHandler) that handles all method invocations that are made on the instance. In this case, Memoizer is the invocation handler.

When a method is invoked on the proxy instance (for example, calculateBinaryDigit), the invoke method is called on the Memoizer instance owned by the proxy instance. The invoke method is passed information that allows it to determine which method was called on the proxy instance, along with the array of arguments. In this example, the java.lang.Method object given to the Memoizer instance represents calculateBinaryDigit, and the array of Object arguments encapsulate the int parameter n (wrapped as an Integer). At this point, Memoizer can step in to do its caching.

The key used for the cache is essentially the method signature: a combination of the Method object and the array of Object arguments passed to the invoke method. For simplicity of implementation, Memoizer has a map of caches, one for each method, and then combines the array of Object arguments into a java.util.List object for the key to the relevant cache. It is convenient to use an implementation of java.util.List for the key, since it makes strong guarantees about the equals method, thus ensuring that method signatures match only if all of the corresponding pairs of arguments are equal.

Once the key has been created, the cache is checked for a previously computed value. If one is found, it is simply returned. Otherwise, the method has not been called with these arguments before, so it is invoked reflectively by calling the invoke method on java.lang.Method. In this example, the invocation will call calculateBinaryDigit on the concrete class PiBinaryDigitsCalculator. The return value is stored in Memoizer's cache before it is returned. The invocation handler has now finished its work.

When To Use Memoizer

As a general rule, Memoizer can be used whenever a more traditional cache -- such as those developed above -- is suitable. More specifically, each method on the interface to be memoized should meet all of the following criteria:

  • The return values of the method should not change from call to call.
  • The method should not have side effects.
  • The method should not take mutable arguments.

Clearly, it is futile to memoize methods that can change every time you invoke them. Equally, it is important not to memoize methods that have intentional side effects (methods that update state in some way), since the side effects will not be repeated for subsequent calls. The single exception to this advice is void methods. The implementation of Memoizer will always invoke a void method, since the only reason to invoke it is for its side effects.

It can be dangerous to memoize methods with mutable arguments for the same reason that it can be dangerous to store mutable classes in hash tables. As the doc comment for java.util.Map states, "The behavior of a map is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is a key in the map." If you call a memoized method with an argument once, mutate the argument object, then call the method a second time, Memoizer might see this as a cache hit and not recompute the value. In short, the wrong value could be returned.

Performance

The whole point of using caching is to improve the speed of your application. However, reflection is notorious for its lack of performance. (The gap is closing, though: Bloch reports that in the Sun implementation of J2SE 1.4, reflective calls are typically only twice as slow as normal calls, compared to a forty-fold difference in J2SE 1.3.) Memoizer makes heavy use of reflection, so it might seem that it is a poor candidate for achieving performance gains. This conclusion is not necessarily valid; if the cost of calling the method to be memoized is high compared to the cost of a reflective method call (that is, the call to the invoke method on the InvocationHandler interface that Memoizer implements), then memoizing the method call can result in a speed up.

Running the PiBinaryDigitsCalculator example on a recent J2SE 1.4 JVM shows that the cost of a memoized method call is comparable to the cost of a normal call for very small values of n (for instance, -10). The overhead of the memoization is quickly swamped by the cost of performing the calculation for larger magnitudes of n. Therefore, clients making frequent use of PiBinaryDigitsCalculator could certainly benefit from a memoized version.

Unfortunately, the only way to tell if your code can benefit from memoization is to profile it. Nevertheless, it is very easy to apply a memoizing wrapper to existing code, and, importantly, it is just as easy to remove a wrapper. This suggests the following simple optimization strategy:

  1. Choose a class as a candidate for memoization.
  2. Profile your class (and the system as a whole).
  3. If the performance is acceptable, go to step 6; otherwise add a memoizing wrapper to the class.
  4. Profile your memoized class (and the system as a whole).
  5. If there is little or no significant performance gain, remove the memoizing wrapper.
  6. Repeat as necessary.

Ideally, you should analyze the impact of adding memoization to a class on the system as a whole, since only then can you be sure that it is worth adding. Some methods, even though they are computationally expensive, may not benefit from being memoized, simply because they are never called with the same arguments more than once. To gauge this, I am making available a more fully featured version of Memoizer that implements an interface called CacheStatistics that allows you to retrieve the number of cache hits and cache misses. You can use this interface during performance testing to measure whether memoization is "pulling its weight."

Extending Memoizer

It is straightforward to modify the Memoizer class to support different cache implementation strategies. A common type of cache is the Least-Recently-Used (LRU) cache that holds a bounded number of entries. The cache ensures that it does not exceed its maximum size by discarding the oldest entry, that is, the one that was least recently accessed. A class might choose to use a LRU cache to protect against memory filling up in an unbounded manner on long-running applications. To select the type of cache used, you simply pass an extra argument to the memoize method that specifies the CacheFactory implementation to use. Here the LRU cache can hold a maximum of 1000 entries:

BinaryDigitsCalculator calculator =
  (BinaryDigitsCalculator) Memoizer.memoize(
      new PiBinaryDigitsCalculator(),
      new LruCacheFactory(1000)
  );

Even in its simplest form, the Memoizer class is another useful tool that should be in every Java programmer's arsenal. But remember -- profile your code before and after using it.

Resources

Tom White has been an Apache Hadoop committer since February 2007, and is a member of the Apache Software Foundation. He has written numerous articles for O'Reilly, java.net and IBM's developerWorks, and has spoken at several conferences, including at ApacheCon 2008 on Hadoop.


Return to ONJava.com.