, author of Java RMI

In the first article in this series, I explained why it is sometimes necessary to expire data, and discussed the standard approaches for doing so. This article focuses on a particular algorithm for data expiration that I have taken to calling a "hashbelt." (The name is a combination of "hashmap" and "conveyor belt.") Hashbelts are a generic data structure that can be easily adapted to a wide variety of problems involving time-sensitive data. They are easy to use, simple to extend, and quite efficient.

This article has two major sections. The first, "Deriving the Hashbelt Algorithm," gives an overview of the hashbelt algorithm and explores its relationship to the expiration algorithms from the first article, without paying much attention to code-level details. The second section, "The Hashbelt Framework," is a detailed discussion of one particular implementation of hashbelts (included in the source code for this article). That is, the first section explains the algorithm and the second attempts to explain my implementation.

By the end of this article, you should understand the hashbelt algorithm and have a reasonable understanding of when it is appropriate to use it.

Things You Should Know Before Reading This Article

Obviously, I am expecting that you have read the first article in this series. In addition, this article is not for people who are just starting to program, or just starting to program in Java. If you don't know how to use the collection libraries, haven't written a distributed program before, or don't have a good grasp of what the synchronized keyword does, you'll probably have trouble reading this article. To learn more about the basics of distributed programming and writing multi-threaded code, I recommend my book on RMI. Besides covering RMI, it covers many of the basic design and coding decisions involved in building distributed applications.

In addition, the code examples for this article use generics quite heavily. Generics are a new feature in Java -- they are not in JDK 1.4, but are available from Sun on an experimental basis. The most complete source of information about generics is the JSR-014 specification. A shorter and friendlier introduction to generics can be found in the third article in my series on command objects.

About the Source Code

Download the example files from this article here.

The source code for both the hashbelt framework and the examples used in this article can be downloaded from here. Please feel free to use and extend the source code for this article. But -- please -- give credit where credit is due, and do not remove the licensing information and my name from the source code. Moreover, please send me e-mail if you come up with any interesting ideas for using or extending the framework; I'm planning on maintaining this code and keeping it up to date for the foreseeable future.

Deriving the Hashbelt Algorithm

In the first article in this series, I discussed a wide variety of expiration mechanisms and then concluded with a list of requirements for a time-based expiration algorithm. None of the requirements are particularly elaborate or subtle, but taken as a whole, they do rule out many different potential mechanisms.

Here's a slightly more elaborate version of those requirements:

The best solution we examined in the first article was the one adopted by Tomcat. Tomcat uses a single instance of Hashtable to store session objects and expires stale session objects using a background thread. Tomcat's background thread executes an infinite loop around the following five steps:

  1. The thread wakes up.
  2. The thread locks the cache for a brief period of time, during which it builds an instance of Vector containing the session objects (it does not copy or clone the session objects; it merely inserts them all into a new data structure).
  3. The thread unlocks the cache, thereby allowing the main code (the servlet engine) to access the cache once more.
  4. The thread iterates over the vector it built, asking each session object whether it is stale. Every stale element in the copy is removed from the main data structure.
  5. The thread goes to sleep for a brief period of time.

In this section, we're going to explain the hashbelt algorithm by describing how hashbelt differs from what Tomcat does. By the end of this section, I'll have completely described the hashbelt approach to data expiration and explained why each change is necessary or useful.

The first change we're going to make is simple: instead of session objects that contain an expiration time, we're going to allow any type of object to be stored. The reason is simple: a caching system for time-sensitive information shouldn't require the objects it stores to understand timestamps and know when they are out of date. Tomcat gets away with using session objects (which store their own timestamps) because it's a single application. We need to be more general, because we're building a framework that can be reused to expire all sorts of data. Later on, when we discuss the code, we're going to define an interface, named ExpirationSystem, that encapsulates exactly what our data expiration algorithms are allowed to assume about the data being stored; for now, it suffices to say that programmers must be able to store any type of object.

Since the objects stored in an instance of ExpirationSystem no longer contain any information about when they expire, we need to store the expiration times for the objects somewhere within our framework. The obvious way to do this is to have an instance of Hashtable, which maps objects to some sort of timestamp (for example, to instances of Date). Instead of doing this, however, we're going to break apart the global cache and store objects in separate containers, based on how long they have to live. More precisely, we're going to do the following:

  • Define a container interface called HashbeltContainer. The idea behind the interface is that we'll have different implementations of our basic container, optimized for different uses.

  • Use many instances of HashbeltContainer, each one of which will only contain objects of approximately the same age. Each object stored in the global cache will be stored in exactly one container.

Once nice consequence of doing things this way is that we'll be able to tell how old an object is, or how close it is to being expired, simply by looking at what container it's in.

Pictorially, we've just performed the following transformation:

Diagram.
Figure 1. The cache is a set of containers based on how much time objects have before they expire. This structure is not reflected in the public interface

Of course, using multiple containers instead of a single instance of Hashtable means that the hashbelt's lookup algorithm is going to be slightly different from Tomcat's. In particular, it's going to iterate over all of the instances of HashbeltContainer. Attempting to fetch an object from the expiration system will still look the same to client code; it will involve a call to a get method. But inside the expiration system, instead of being a single lookup, the implementation of get will iterate over all of the containers, starting with the most recent and stopping when the object has been found. This algorithm is illustrated in figure 2.

Diagram - click for full-size view.
Figure 2. Implementing a query (click for full size view).

Now that we've changed how objects are stored, we also need to change the way objects are expired. Hashbelt still uses a background thread for object expiration. However, the actual expiration algorithm is quite different from Tomcat's. In a hashbelt system, the background thread performs the following tasks:

  1. The thread wakes up.
  2. The thread creates a new container.
  3. The thread locks the main data structure and then "rotates" the set of containers. The newly-created container goes into the first position, and the last container is removed from the data structure entirely. This "rotation" is the "conveyor belt" I mentioned earlier.
  4. The thread unlocks the main data structure and then iterates over the removed container, performing some sort of expiration operation for each object it finds.
  5. The thread goes to sleep for a brief period of time.

This sequence of operations is illustrated in figure 3.

Diagram.
Figure 3. How containers are expired.

At this point, we've covered how a hashbelt stores objects (in multiple containers, sorted by expiration time) and we've covered the basic "rotate and expire" algorithm. However, in the hashbelt algorithm as defined so far, there are some very compelling questions that we don't know the answer for. Namely:

  • What do we do with an object once it's been retrieved?
  • What do we do with an object when it's been removed from the data structure?
  • How do we expire an arbitrary object anyway?

Unfortunately, there are no universal answers to these questions. For example, consider session keys in a servlet engine. Session keys answer these questions in the following way:

  • What do we do with a session key once it's been retrieved? We remove it from the container we found it in, and stick it in the container with the most recent timestamp (this effectively postpones the expiration of the session).
  • What do we do with an object when it's been removed from the data structure? We actively expire it.
  • How do we expire a session? We remove any resources associated to the session that may have been allocated. In addition, we probably record the end of the session in some persistent storage mechanism.

On the other hand, consider weather reports from Half Moon Bay. The answers are completely different:

  • What do we do with a weather report once it's been retrieved? We leave it in the container in which we found it. Under no circumstances should we postpone the expiration. The whole point is that once weather information gets old, it ought to replaced regardless of whether someone has accessed it recently.

  • If you're not familiar with strategy objects, or it's been a while since you've seen the strategy pattern in action, I highly recommend Design Patterns: Elements of Reusable Object-Oriented Software (Addison Wesley, 1995). It's a great book.

  • What do we do with an object when it's been removed from the data structure? We don't do anything. Garbage collection handles getting rid of the instance.

  • How do we expire a weather report? We don't. If a user requests a weather report that isn't in the ExpirationSystem, we'll fetch it at the time it's requested. That is, the appropriate re-fetch is when the information is requested, not when the old information has expired.

Because these sets of answers are so different, our implementation of the Hashbelt data structure will have to be flexible and easily customized. We'll achieve the flexibility by using strategy objects to handle application-specific behavior.

The Hashbelt Framework

Now that we've discussed the basic hashbelt algorithm, it's time to discuss the library that implements hashbelts. The hashbelt framework is contained in the packages grosso.expiration and grosso.collections (and their subpackages). The code involved in implementing hashbelts, while flexible, isn't that complicated. Without the examples, it consists of three interfaces, 13 classes, and just under 900 lines of code. More importantly, the core of the implementation consists of just three classes and two interfaces. In order, these are:

  • AbstractHashbeltExpirationSystem. This is an abstract base class. It implements the ExpirationSystem interface and handles most of the basic bookkeeping for implementations of the hashbelt algorithm.

  • StandardHashbeltExpirationSystem. This is one of two concrete subclasses of AbstractHashbeltExpirationSystem. Instances of this class automatically expire objects at the appointed time, regardless of whether or not the object has been used in the interim.

  • UpdatingHashbeltExpirationSystem. This is one of two concrete subclasses of AbstractHashbeltExpirationSystem. Instances of this class automatically extend the lifetime of objects that have been recently accessed.

  • HashbeltContainer. This is an interface that defines the basic functionality required of each container in the hashbelt. There are three implementations: FastIteratingHashbeltContainer, StandardHashbeltContainer, and HashlistBasedHashbeltContainer.

  • ExpirationHandler. This is an interface that defines a single method, handleExpiredContainer. When an instance of AbstractHashbeltExpirationSystem is created, it is given an instance of ExpirationHandler to use at the appropriate time.

In order to use hashbelts in your program, you will need to do the following:

  1. Choose a subclass of AbstractHashbeltExpirationSystem (either StandardHashbeltExpirationSystem or UpdatingHashbeltExpirationSystem).
  2. Choose a type of container to use. All of the implementations of HashbeltContainer have the same semantics; choosing the appropriate container is simply an optimization and will depend on how you expect the system to be used.
  3. Write an expiration handler. Since an expiration system can store any type of object, and is a library being provided without any knowledge of your code, it makes sense that you have to provide the actual code that performs the expiration.

This is all summarized in the following UML diagram:

Diagram -- click for full-size view.
Figure 4. The hashbelt object hierarchy (Click for full-size view).

We'll now proceed to talk about the various parts of the framework in more detail.

The ExpirationSystem Interface

The first part of the code we're going to discuss is the ExpirationSystem interface. This interface plays a very simple role in the code: it isolates the clients from the implementation of the code that expires data. A program that uses a hashbelt-based expiration system naturally divides into three parts:

  • Initialization code. This code, which is usually static and often very simple, creates the expiration system and puts everything in motion. As such, this code encapsulates domain-specific algorithmic decisions (which type of container to use, which type of hashbelt to use, and how to expire objects) and must be written by the application programmer.
  • The implementation of the expiration system. This is the code in the framework.
  • Code that accesses the expiration system. This is usually the majority of the application-specific code. While the initialization code only occurs once, is a fixed cost, and is located in only a few objects, the code that accesses the expiration system is often spread throughout the program and is written while the programmer is thinking of something else.

This last bullet is the reason for using a simple interface to the expiration system. Doing so simplifies the majority of the code application programmers must write. The interface is basically a simplified version of java.util.Map (with a halt method thrown in). This reflects the goal of the interface -- from the point of view of the application code, an expiration system should look like a container class, and none of the methods called on an expiration system should result in an object being expired. Here's the code for the ExpirationSystem interface:

	package grosso.expiration;

	import java.util.*;

	public interface ExpirationSystem<K,V>{
		public void put(K key, V expirableObject);
		public void remove(K key);
		public void clear();	
		public V get(K key);
		public void halt();
	}

Note: We could have added methods like expire or expireAll (to force the expiration of a single object or all of the objects in the system, respectively), but there are two reasons for not doing so. The first is that it's rarely necessary; with the possible exception of server shutdowns, you should never want to expire all of the objects in an expiration system. And the second reason is that doing so seems contrary to the spirit of the library -- the application programmer (and her code) should treat the expiration system like just another container data structure. If you find that you need an expire or expireAll method, I'd love to hear about it. More generally, I'm planning on maintaining this code (and keeping it up to date) for the foreseeable future; suggestions for changes or enhancements are welcome.

Writing most of the application to this interface also makes it easier to replace hashbelts with a different data structure, should it be necessary for specific applications.

I've also defined an abstract base class, AbstractExpirationSystem, and an associated helper class, ExpirationSystemExpirationThread. Neither AbstractExpirationSystem nor ExpirationSystemExpirationThread do much; ExpirationSystemExpirationThread extends Thread in order to define a background thread that occasionally wakes up to expire objects and AbstractExpirationSystem is a base class for expiration systems that starts the expiration thread going. Basically, these classes formalize code we've seen more than once in the first article, and place it in an abstract base class.

Here's the code for AbstractExpirationSystem and ExpirationSystemExpirationThread:

  package grosso.expiration;
  import java.util.*;

  public abstract class AbstractExpirationSystem <K,V>implements ExpirationSystem<K,V> {

    protected abstract void expireObjects();
    private ExpirationSystemExpirationThread _expirationThread;

    public AbstractExpirationSystem() {
      _expirationThread = new ExpirationSystemExpirationThread(this);
      _expirationThread.start();
    }

    public AbstractExpirationSystem(long timeToSleep) {
      _expirationThread = new ExpirationSystemExpirationThread(timeToSleep, this);
      _expirationThread.start();
    }

    public synchronized void halt() {
      clear();
      _expirationThread.halt();
    }
  }

  package grosso.expiration;
  import java.util.*;

  public class ExpirationSystemExpirationThread extends Thread {
    public static final long FIFTEEN_SECONDS = 15 * 1000;
    public static final long REAPER_THREAD_SLEEP_TIME = FIFTEEN_SECONDS;
  
    private boolean _shouldKeepRunning = true;
    private long _timeToSleep;
    private AbstractExpirationSystem _owner;
  
    public ExpirationSystemExpirationThread(AbstractExpirationSystem owner) {
      this(REAPER_THREAD_SLEEP_TIME, owner);
    }

    public ExpirationSystemExpirationThread(long timeToSleep, AbstractExpirationSystem owner) {
      _owner = owner;
      _timeToSleep = timeToSleep;
      setPriority(Thread.MIN_PRIORITY);
    }

    public synchronized void halt() {
      _shouldKeepRunning = false;
    }

    public void run() {
      while(shouldKeepRunning()) {
        try {
          Thread.sleep(REAPER_THREAD_SLEEP_TIME);
        }
        catch (InterruptedException ignored){}
        _owner.expireObjects();
      }
    }

    private synchronized boolean shouldKeepRunning() {
      return _shouldKeepRunning;
    }
  }

There is one point to note here -- AbstractExpirationSystem defines a new abstract method called expireObjects. This is a protected abstract method that is called by the background thread. It is a convenient method; subclasses of AbstractExpirationSystem implement it in order to expire objects when enough time has elapsed. Application programmers don't need to implement or call expireObject and it is therefore not in the ExpirationSystem interface.

AbstractHashbeltExpirationSystem and its Subclasses

The next set of classes we need to discuss are AbstractHashbeltExpirationSystem, StandardHashbeltExpirationSystem, and UpdatingHashbeltExpirationSystem. These are the core of the framework; they maintain the "conveyor belt" part of the system and delegate the application-specific behavior to strategy classes.

We'll start by looking at the code for AbstractHashbeltExpirationSystem:

  package grosso.expiration.hashbelt;
  import grosso.expiration.*;
  import grosso.expiration.hashbelt.handlers.*;

  import java.util.*;

  /*
    AbstractHashbeltExpirationSystem.

    Abstract superclass that handles most of the conveyor belt
    details. The subclasses (Standard and Updating) implement
    add and get functionality to specify whether "reiserting" is the
    right behavior.
  */

  public abstract class AbstractHashbeltExpirationSystem<K,V> extends AbstractExpirationSystem<K,V>{
    protected static final int DEFAULT_NUMBER_OF_CONTAINERS = 5;
    protected static final long ONE_MINUTE = 60 * 1000;
    protected static final long DEFAULT_ROTATION_TIME = ONE_MINUTE;

    protected int _numberOfContainers;
    protected HashbeltContainer<K,V> [] _containers;
    protected ExpirationHandler<K,V> _expirationHandler;

    protected abstract HashbeltContainer<K,V> getNewContainer();
    public abstract void put(K key, V expirableObject);
    public abstract V get(K key);

  
    public AbstractHashbeltExpirationSystem(){
      this (DEFAULT_NUMBER_OF_CONTAINERS, DEFAULT_ROTATION_TIME, new NullExpirationHandler<K,V>());
    }

    public AbstractHashbeltExpirationSystem (int numberOfContainers, long rotationTime) {
      this (numberOfContainers, rotationTime, new NullExpirationHandler<K,V>());
    }

    public AbstractHashbeltExpirationSystem(ExpirationHandler<K,V> expirationHandler){
      this (DEFAULT_NUMBER_OF_CONTAINERS, DEFAULT_ROTATION_TIME, expirationHandler);
    }

    public AbstractHashbeltExpirationSystem (int numberOfContainers, long rotationTime, ExpirationHandler<K,V> expirationHandler ) {
      super(rotationTime);
      _numberOfContainers  = numberOfContainers;
      _expirationHandler  = expirationHandler;
      _containers  = new HashbeltContainer<K,V>[_numberOfContainers];
      for (int i = 0; i < _numberOfContainers; i ++) {
        _containers[i] = getNewContainer();
      }
    }

    public void remove(K key) {
      HashbeltContainer<K,V> container = findContainer(key);
      if (null != container) {
        container.remove(key);
      }
    }

    public synchronized void clear() {
      for (int index = 0 ; index < _numberOfContainers; index++ ) {
        _containers[index].clear();
      }
    }

    protected void expireObjects() {
      HashbeltContainer<K,V> newContainer = getNewContainer();
      HashbeltContainer<K,V> expiredContainer = rotateContainers(newContainer);
      _expirationHandler.handleExpiredContainer(expiredContainer);
    }

    protected synchronized HashbeltContainer<K,V> findContainer (K key) {
      for (int index = 0; index < _numberOfContainers; index ++) {
        if (null != _containers[index].get(key)) {
          return _containers[index];
        }
      }
      return null;
    }

    protected synchronized V findObjectForKey (K key) {
      V returnValue;
      for (int index = 0; index < _numberOfContainers; index ++) {
        returnValue = _containers[index].get(key);
        if (null != returnValue) {
          return returnValue;
        }
      }
      return null;
    }


    private synchronized HashbeltContainer<K,V> rotateContainers (HashbeltContainer<K,V> newContainer) {
    /*
      This doesn't do a single, in-place, array copy because we can't guarantee
      how the JDK implements the array copy (e.g. whether it''ll do the inplace
      copy correctly). It's much safer to use a second array.
    */

      HashbeltContainer<K,V> returnValue = _containers[_numberOfContainers - 1];
      HashbeltContainer<K,V>[] newContainers = new HashbeltContainer<K,V>[_numberOfContainers];
      System.arraycopy(_containers, 0, newContainers, 1, _numberOfContainers -1);
      _containers = newContainers;
      _containers[0] = newContainer;
      return returnValue;
    }
  }

All of the functionality in an instance of AbstractHashbeltExpirationSystem centers around one of two operations: finding the right container or rotating the containers. For example, the implementation of remove consists of two calls. The first call, to findContainer, finds the container an object is in. The second call removes the object from the container. Rotating the containers is only slightly more difficult. When the background thread (inherited from AbstractExpirationSystem) calls expireObject, the hashbelt creates a new container, uses the new container to perform the rotation algorithm, and then uses an instance of ExpirationHandler to expire all of the objects in the container (if necessary; the default expiration handler, NullExpirationHandler does nothing).

AbstractHashbeltExpirationSystem defines three abstract methods: get, put and getNewContainer. getNewContainer is new, and won't be implemented in the framework at all (it's left up to the application programmer to decide which implementation of HashbeltContainer to use). get and put are left abstract in AbstractHashbeltExpirationSystem because they have two very different implementations. One simply leaves the object where it was found in the expiration system (this is the behavior implemented in StandardHashbeltExpirationSystem) and one removes the object from the expiration system and inserts it back into the front, into _containers[0]. (This is the behavior implemented in UpdatingHashbeltExpirationSystem.)

Here's the code for StandardHashbeltExpirationSystem and UpdatingHashbeltExpirationSystem:

  package grosso.expiration.hashbelt;
  import grosso.expiration.*;

  import java.util.*;

  /*
    StandardHashbeltExpirationSystem.

    The standard implementation of get and put does not insert
    objects at the front of the conveyor belt.
  */

  public abstract class StandardHashbeltExpirationSystem <K,V>extends AbstractHashbeltExpirationSystem<K,V>{
    public StandardHashbeltExpirationSystem(){
      super ();
    }

    public StandardHashbeltExpirationSystem (int numberOfContainers, long rotationTime) {
      super(numberOfContainers, rotationTime);
    }

    public StandardHashbeltExpirationSystem(ExpirationHandler<K,V> expirationHandler){
      super (expirationHandler);
    }

    public StandardHashbeltExpirationSystem (int numberOfContainers, long rotationTime, ExpirationHandler<K,V> expirationHandler) {
      super(numberOfContainers, rotationTime, expirationHandler);
    }

    public void put(K key, V expirableObject ) {
      HashbeltContainer<K,V> container = findContainer(key);
      if (null == container) {
        _containers[0].put(key, expirableObject);
      }
    }

    public V get(K key) {
      return findObjectForKey(key);
    }
  }


  package grosso.expiration.hashbelt;
  import grosso.expiration.*;

  /*
    UpdatingHashbeltExpirationSystem.

    A subclass of AbstractHashbeltExpirationSystem that moves requested
    elements back into the first container when a get or put occurs.
    Suitable for information that is expired when it is not used for a
    while (cached data, session keys, that sort of thing).
  */

  public abstract class UpdatingHashbeltExpirationSystem<K,V>extends AbstractHashbeltExpirationSystem<K,V> {
    public UpdatingHashbeltExpirationSystem() {
      super();
    }

    public UpdatingHashbeltExpirationSystem (int numberOfContainers, long rotationTime) {
      super(numberOfContainers, rotationTime);
    }

    public UpdatingHashbeltExpirationSystem(ExpirationHandler<K,V> handler) {
      super(handler);
    }

    public UpdatingHashbeltExpirationSystem (int numberOfContainers, long rotationTime, ExpirationHandler<K,V> expirationHandler ) {
      super(numberOfContainers, rotationTime, expirationHandler);
    }

    public void put(K key, V expirableObject) {
      HashbeltContainer<K,V> container = findContainer(key);
      if (null != container) {
        container.remove(key);
      }
      _containers[0].put(key, expirableObject);
    }

    public V get(K key) {
      HashbeltContainer<K,V> container = findContainer(key);
      if (null == container) {
        return null;
      }
      V returnValue = container.get(key);
      container.remove(key);
      _containers[0].put(key, returnValue);
      return returnValue;
    }
  }

The ExpirationHandler Interface

After the background thread rotates the containers, it still has a reference to the just-expired container (the container that has just been removed from the _containers array). Before going to sleep, the background thread needs to expire all of the objects in the just-expired container. It does this by using a strategy object that implements the ExpirationHandler interface. The idea is that the code that expires objects is almost always application-specific; the task is best left to code written by the application programmers.

The ExpirationHandler interface is very short: it contains a single method. Here's the entire interface:

  package grosso.expiration.hashbelt;
  import grosso.expiration.*;

  import java.util.*;

  public interface ExpirationHandler<K,V> {
    public void handleExpiredContainer(HashbeltContainer<K,V> expiredContainer);
  }

In spite of the simplicity of this interface, there are three implementations of ExpirationHandler in the downloadable source code: NullExpirationHandler, SimpleExpirationHandler, and ReinsertingExpirationHandler. NullExpirationHandler is by far the simplest implementation: it does absolutely nothing. Strangely enough, this is often the correct strategy, because the expiration system doesn't have any references to either the container or the objects inside the container. As long as objects outside the expiration system aren't holding any references to the expired objects, using NullExpirationHandler is a convenient way to let the garbage collector perform cleanup.

SimpleExpirationHandler is an abstract class that loops through all of the objects in the container and calls the timeExpired method for each one. Application programmers can subclass this object and override this method to perform expiration. Here's the code for SimpleExpirationHandler:

  package grosso.expiration.hashbelt.handlers;
  import grosso.expiration.hashbelt.*;
  import grosso.expiration.*;

  import java.util.*;

  /*
    SimpleExpirationHandler.

    Does the obvious. Calls timeExpired for each object in the container. At the end,
    the objects in the container have been expired and there's no record of
    them in the AbstractHashbeltExpirationSystem.
  */

  public abstract class SimpleExpirationHandler<K,V>implements ExpirationHandler<K,V> {
    public void handleExpiredContainer(HashbeltContainer<K,V> expiredContainer) {
      Iterator<V> i = expiredContainer.getValues();
      while (i.hasNext()) {
        V nextValue = i.next();
        timeExpired(nextValue);
      }
    }

    protected abstract void timeExpired(V object);
  }

The final expiration handler, ReinsertingExpirationHandler, is an abstract class that loops through all of the objects in the container, calls the timeExpired method for each one, and then reinserts the expired objects back into the front of the hashbelt.

This might seem a little strange at first glance: this entire series has been about expiring objects, and it's not clear why an already expired object would be reinserted into the system. There are cases where this behavior can be useful. For example, consider the much-loved "stock ticker" example. Stock tickers require that prices be pushed to clients periodically. The hashbelt framework can be used to perform this task quite efficiently for large numbers of clients. You simply define a class that encapsulates how to push information to a person and create a subclass of ReinsertingExpirationHandler, which pushes the information to the client. Because ReinsertingExpirationHandler puts the instance back into the hashbelt, the client will have information pushed to them on a regular basis.

Suppose, for example, you use five containers with a rotation time of 10 seconds each. Then you have the following:

  • From the server's point of view. Every 10 seconds, a bucket is rotated. One container is expired, which means that approximately 20% of the clients are going to receive updated information (and then get reinserted into the hashbelt).
  • From the client's point of view. Updated information is pushed out every 50 seconds or so.

The nice part about this is that the only code the application programmer has to write is the code that's application-specific (the actual remote method calls, and the code that obtains the stock market data).

Here's the code for ReinsertingExpirationHandler:

  package grosso.expiration.hashbelt.handlers;
  import grosso.expiration.hashbelt.*;
  import grosso.expiration.*;

  import java.util.*;
  /*
    ReinsertingExpirationHandler.

    Calls timeExpired for the object and then reinserts it in the front of the
    expiration system.

    Useful for alerts and announcements. E.g. suppose you're supposed to
    send someone an update every 15 minutes. Use this one and an object
    that sends the message inside its "expire" method.
  */

  public abstract class ReinsertingExpirationHandler<K,V>implements ExpirationHandler<K,V> {
    private AbstractHashbeltExpirationSystem<K,V> _owner;
    public ReinsertingExpirationHandler(AbstractHashbeltExpirationSystem<K,V> owner) {
      _owner = owner;
    }

    public void handleExpiredContainer(HashbeltContainer<K,V> expiredContainer) {
      Iterator<K> i = expiredContainer.getKeys();
      while (i.hasNext()) {
        K key = i.next();
        V nextValue = expiredContainer.get(key);
        timeExpired(nextValue);
        if (null!=_owner) {
          _owner.put(key, nextValue);
        }
      }
    }

    protected abstract void timeExpired(V object) ;
  }

There is one further interesting behavioral difference between SimpleExpirationHandler and ReinsertingExpirationHandler. SimpleExpirationHandler iterates using the values stored in the container while ReinsertingExpirationHandler iterates using the keys. All of the implementations of HashbeltContainer discussed below are more efficient using values rather than using keys.

Warning: Don't get confused by the superficial similarity between UpdatingHashbeltExpirationSystem and ReinsertingExpirationHandler. UpdatingHashbeltExpirationSystem extends the lifetime (e.g., the time until expiration) of objects that have been recently accessed. ReinsertingExpirationHandler reinserts objects into the system after they've been expired. These are very different behaviors.

The HashbeltContainer Interface

The final piece of the puzzle is the HashbeltContainer interface. Just like the ExpirationSystem interface, the HashbeltContainer interface is designed to look a lot like java.util.Map. It uses keyed access, has get and put operations, and can be iterated over, either using keys or values.

However, implementations of HashbeltContainer don't need to be quite as general as implementations of java.util.Map. We know more about how HashbeltContainer will be used. We know that get, remove, and put are mostly used while the container is active (e.g., before the entire container is removed from the _containers array). And we know that getValues and getKeys are only used, if they are used at all, by the expiration handler during expiration. This information helps application programmers choose the right implementation of HashbeltContainer.

Here's the HashbeltContainer interface:

  package grosso.expiration.hashbelt;
  import grosso.expiration.*;

  import java.util.*;

  /*
    HashbeltContainer
  
    Holds some of the objects for a Hashbelt. Conceptually, this is
    "one bucket on the conveyor belt."
  
    Note: implementations of this interface need to be highly synchronized-- the
    implementations of the hashbelt rely on this object to be threadsafe (at
    least in the get/remove/put block. getIterator is only called once the
    container is dead and is called from within a single thread).
  */

  public interface HashbeltContainer<K,V>{
    public V get(K key);
    public V remove(K key);
    public void put(K key, V value);
    public void clear();
    public Iterator<V> getValues();
    public Iterator<K> getKeys();
    public int size();
  }

The simplest version of HashbeltContainer, StandardHashbeltContainer, is implemented as a wrapper around an instance of Hashmap. Here's the code for StandardHashbeltContainer:

  package grosso.expiration.hashbelt.containers;
  import grosso.expiration.hashbelt.*;
  import grosso.expiration.*;

  import java.util.*;

  /*
      

    A basic implementation of the HashbeltContainer interface. It
    uses a Hashmap to store key/value pairs. This is optimized for
    lots of gets with a fair number of removes. It there aren't going
    to be many removes, it's better to use FastIteratingHashbeltContainer
    (which also has a vector of instances to speed up iteration).
  */

  public class StandardHashbeltContainer<K,V>implements HashbeltContainer<K,V>{
    private HashMap<K,V> _keysToExpirableObjects = new HashMap<K,V>();

    public void clear() {
      _keysToExpirableObjects.clear();
    }

    public synchronized V get (K key) {
      return _keysToExpirableObjects.get(key);
    }

    public synchronized V remove (K key) {
      return _keysToExpirableObjects.remove (key);
    }

    public synchronized void put(K key, V value) {
      _keysToExpirableObjects.put (key, value);
    }

    public Iterator<V> getValues() {
      ArrayList<V> values = new ArrayList<V>( _keysToExpirableObjects.values());
      return values.iterator();
    }
  
    public Iterator<K> getKeys() {
      ArrayList<K> keys = new ArrayList<K>( _keysToExpirableObjects.keySet());
      return keys.iterator();
    }

    public int size(){
      return _keysToExpirableObjects.size();
    }
  }

In many cases, StandardHashbeltContainer is exactly the right data structure to use. For example, recall our discussion of weather reports from Half Moon Bay. We said:

  • What do we do with a weather report once it's been retrieved? We leave it in the container in which we found it. Under no circumstances should we postpone the expiration. The whole point is that once weather information gets old, it ought to replaced, regardless of whether someone has accessed it recently.
  • What do we do with an object when it's been removed from the data structure? We don't do anything. Garbage collection handles getting rid of the instance.
  • How do we expire a weather report? We don't. If a user requests a weather report that isn't in the ExpirationSystem, we'll fetch it at the time it's requested. That is, the appropriate re-fetch is when the information is requested, not when the old information expired.

This is, essentially, the statement that we will insert objects into the container and then fetch them using keys. But we will never remove them from the containers, and we will never need to iterate over the container during expiration (since expiration really just requires the objects to be garbage-collected). In which case, we ought to be using a container that is optimized for insertions and retrievals, but that doesn't handle iteration very well at all. The best container for this purpose is StandardHashbeltContainer.

Consider, on the other hand, our answers for session keys:

  • What do we do with a session key once it's been retrieved? We remove it from the container in which we found it, and stick it in the container with the most recent timestamp (this effectively postpones the expiration of the session).
  • What do we do with an object when it's been removed from the data structure? We actively expire it.
  • How do we expire a session? We remove any resources associated to the session that may have been allocated. In addition, we probably record the end of the session in some persistent storage mechanism.

These answers indicate that we'll be doing a lot more removes (as we move sessions keys between containers) and that we'll also be doing some iterating during expiration (since we actually do need to expire each object). One possible solution is to use a combined hashmap and linked list, as in HashlistBasedHashbeltContainer. This class uses hashing for fast insertion and removal, and linked lists for fast iteration.

  package grosso.expiration.hashbelt.containers;
  import grosso.expiration.hashbelt.*;
  import grosso.expiration.*;
  import grosso.collections.*;

  import java.util.*;

  /*
    HashlistBasedHashbeltContainer

    A slight variation on the standard hashbelt container. This maintains
    three data structures:

      a hashmap from keys to values
      a hashmap from keys to nodes
      a linked list (of nodes)

    The idea is that this lets us have fast updates and removals, but also
    fairly fast iterations at the end.

    We can't use the java.util linked list class for this because,
    unfortunately, they don't expose the node class (and the get
    methods are insanely painful).

  */

  public class HashlistBasedHashbeltContainer<K,V>implements HashbeltContainer<K,V>{
    private  HashMap<K,V> _keysToExpirableObjects = new HashMap<K,V>();
    private  HashMap<K,Node<V>> _keysToNodes = new HashMap<K,Node<V>>();
    private DoublyLinkedList<V> _expirableObjects = new DoublyLinkedList<V>();

    public void clear() {
      _keysToExpirableObjects.clear();
    }

    public synchronized V get (K key) {
      return _keysToExpirableObjects.get(key);
    }

    public synchronized V remove (K key) {
      V returnValue = _keysToExpirableObjects.remove (key);
      if (null == returnValue) {
        return null;
      }
      Node<V> node = _keysToNodes.get(key);
      node.remove();
      return returnValue;
    }

    public synchronized void put(K key, V value) {
      _keysToExpirableObjects.put (key, value);
      Node<V> node = _expirableObjects.add(value);
      _keysToNodes.put(key, node);
    }

    public Iterator<V> getValues() {
      return _expirableObjects.iterator();
    }

    public Iterator<K> getKeys() {
      ArrayList<K> keys = new ArrayList<K>( _keysToExpirableObjects.keySet());
      return keys.iterator();
    }

    public int size(){
      return _keysToExpirableObjects.size();
    }
  }

Note: I first ran across the idea of using a hashlist in an IBM white paper on Java performance. While the paper I read appears to have vanished from the face of the earth, a similar paper is available here.

I've also included a third container implementation in the source code for this article. FastIteratingHashbeltContainer is essentially the same as HashlistBasedHashbeltContainer. The only difference is that it uses an instance of java.util.Vector instead of a linked list. This yields faster iterations, and makes insertions a little faster than with HashlistBasedHashbeltContainer. On the other hand, because removing an object from the middle of a vector involves performing an array copy on the remaining references, removing an object is now much more expensive than with HashlistBasedHashbeltContainer.

You should also note that all three containers are explicitly in-memory structures. If you need your containers to use a persistent storage medium (such as a file or an external Javaspace), you will have to write your own container.

Summary

After reading the first article in this series, you should have a good understanding of data expiration, of when data expiration is necessary, and what problems can arise when you attempt to expire data in a scalable and robust way. This article covered the basics of the hashbelt algorithm and explained a particular implementation in detail; you should now feel comfortable with the source code for this article and feel comfortable with its basic design.

In the final article in this series, we'll take the examples from the first article and the hashbelt implementation from this one, and show how to combine them. In particular, we'll discuss how to implement session keys using hashbelts. We'll also reimplement RemoteStubCache from my earlier series on command objects in distributed computing.

Related Articles by William Grosso


Return to ONJava.com.