ONJava.com -- The Independent Source for Enterprise Java
oreilly.comSafari Books Online.Conferences.

advertisement

AddThis Social Bookmark Button

Introducing Automatic Data Expiration
Pages: 1, 2, 3, 4

Dealing with Data Inconsistency

Now that we've gotten rid of all the extra threads, the biggest problem with our data expiration code is that it suffers from data inconsistency. If an object has a reference to an instance of ExpirableObject and expire is called, what happens ? How do we prevent additional method calls on the "dead" object? And how do we get rid of the references to the "dead" object?



One approach is what I like to call the high discipline approach. It goes something like this:

  • Instances of ExpirableObject should know when they're invalid. They should set a Boolean flag inside the expired method and check it whenever a method is called.
  • Every publicly-accessible method should be synchronized, to prevent an object that's in the middle of being expired from being used.
  • Every method in an instance of ExpirableObject should be declared as throwing an ExpiredObject. Objects that call methods on an expired object need to catch the exception and behave accordingly.

The reason I call this way of doing things high-discipline is because it doesn't actually change the code for ExpirableObject (the framework code is unchanged). Instead, it requires that every subclass of ExpirableObject be written more carefully, with an eye towards expiration. Often, this isn't particularly hard code to write, but it can be hard code to remember to write. It's simply a bad idea to rely on people using a framework to do all the framework-related bookkeeping correctly.

Rather than pursue the high-discipline approach, programmers often prefer the following three-part solution:

  • Instances of ExpirableObject are contained in a centralized container and obtained by using a key (they are usually stored in a hashtable using the key).
  • Objects that need to use an instance of ExpirableObject acquire a reference to an object by calling the container. After using the instance of ExpirableObject, they null the reference.
  • The expire method is now simple and can actually be implemented in the abstract superclass (it just removes the instance from the centralized container).

This new approach has several nice benefits. It's easy to understand, it makes sense, and it's easy to extend. This last property is important -- new subclasses of ExpirableObject are easy to write and don't involve any complicated logic (outside of the logic inherent to the class). If you're writing a framework, you need to make sure that it's easier for programmers to use the framework than to solve the problem themselves.

This way of doing things also fits well with the usage model for many different types of expirable data. Session keys naturally lend themselves to a data model where a singleton is used to look up information. For example, in a Web server, an HTTP request is sent from a browser and the servlet runner uses a session cookie to look up the session information from a central repository. Similarly, cached data often wants to be globally accessible (the whole point of using RemoteStubCache was to provide a centralized repository for stubs, and thus enable objects that didn't know about each other to share resources).

Here's the code for a version of ExpirableObject that follows this approach. The only significant change in Expirable is the addition of a new abstract method, getKey, which will be used to index the instances.

package grosso.thirdexample;

/*
  ExpirableObject.

  Abstract superclass for objects which will expire.
  During construction, instances of this class register
  with the ExpirationHandler, which owns a background
  thread which expires objects.

  In contrast to the other implementations, in this
  example, expire is just used to expire local data
  (removing the object from the global cache is handled
  by the expiration thread)

*/
public abstract class ExpirableObject{
  public static final long FIFTEEN_MINUTES = 15 * 60 * 1000;
  public static final long DEFAULT_LIFETIME = FIFTEEN_MINUTES;

  private long _expirationTime;

  protected abstract void expire();
  protected abstract Object getKey();

  public ExpirableObject() {
    this(DEFAULT_LIFETIME);
  }

  public ExpirableObject(long timeToLive) {
    _expirationTime = System.currentTimeMillis() + timeToLive;
    GlobalCache.add(this);
  }

  public void setExpiration(long timeToLive) {
    _expirationTime = System.currentTimeMillis() + timeToLive;
  }

  public final boolean shouldExpire() {
    return System.currentTimeMillis() > _expirationTime;
  }  
}

GlobalExpirationHandler has changed more. It's been renamed, to GlobalCache, as befits its slightly expanded role in the system. There's also a new method named get for clients to access cached objects. And, most significantly, Expirer has been broken out into a new subclass of Thread called CacheExpirationThread.

Here's the code for GlobalCache and Expirer.

package grosso.thirdexample;

import java.util.*;

/*
  GlobalCache (formerly GlobalExpirationHandler).

  Static methods that handle expiring objects.
  This class owns a background thread that wakes
  up every so often and expires objects.
*/


public class GlobalCache{

  private static Hashtable<Object, ExpirableObject> _cachedObjects =
    new Hashtable<Object, ExpirableObject>();

//  _objectVector is redundant, but lets us iterate quickly.
  private static Vector<ExpirableObject> _objectVector =
    new Vector<ExpirableObject>();

  private static CacheExpirationThread _cacheExpirationThread;

  private GlobalCache() {}  // No instances allowed.

  static {
    _cacheExpirationThread = new CacheExpirationThread();
    _cacheExpirationThread.start();
  }
  
  public synchronized static void add(ExpirableObject expirableObject) {
    Object key = expirableObject.getKey();
    if (_cachedObjects.containsKey(key)) {
      return;
    }
    _objectVector.add(expirableObject);  
    _cachedObjects.put(key, expirableObject);  
  }

  public synchronized static ExpirableObject get(Object key) {
    return _cachedObjects.get(key);
  }

  public synchronized static void halt() {
    _objectVector.clear();
    _cachedObjects.clear();
    _cacheExpirationThread.halt();
  }

  protected synchronized static void expireObjects() {
    ListIterator<ExpirableObject> i = _objectVector.listIterator();    
    while (i.hasNext()) {
      ExpirableObject nextExpirableObject = i.next();
      if (nextExpirableObject .shouldExpire()) {
        expireObject(nextExpirableObject );
        i.remove();
        _cachedObjects.remove(nextExpirableObject);
      }
    }
  }

  private static void expireObject(ExpirableObject expirableObject ) {
    try {
      expirableObject.expire();    
    }
    catch (Throwable t) {
      t.printStackTrace();
    }
  }
}

package grosso.thirdexample;

import java.util.*;

/*
  CacheExpirationThread

  It wakes up, it cleans up the cache, it goes back to sleep.
*/


public class CacheExpirationThread extends Thread{
  public static final long FIFTEEN_SECONDS = 15 * 1000;
  public static final long REAPER_THREAD_SLEEP_TIME = FIFTEEN_SECONDS ;
  
  private boolean _shouldKeepRunning = true;
  private long _timeToSleep;
  public CacheExpirationThread() {
    this(REAPER_THREAD_SLEEP_TIME);
  }

  public CacheExpirationThread(long timeToSleep) {
    super("CacheExpirationThread");
    setPriority(Thread.MIN_PRIORITY);
    _timeToSleep = timeToSleep;
  }

  public synchronized void halt() {
    _shouldKeepRunning = false;
  }

  public void run() {
    while(shouldKeepRunning()) {
      try {
        Thread.sleep(REAPER_THREAD_SLEEP_TIME);
      }
      catch (InterruptedException ignored){}
      GlobalCache.expireObjects();
    }      
  }
    
  private synchronized boolean shouldKeepRunning() {
    return _shouldKeepRunning;    
  }
}

In this version, instances of ExpirableObject automatically register with the GlobalCache object. They will be globally accessible to any caller with the correct key, and they will be automatically expired. This new version still suffers from a data inconsistency problem -- an instance can continue to be used even after it has expired. But at this point, it's a slight window. So we're doing pretty well.

A Synchronization Bottleneck

Unfortunately, we've introduced another problem: GlobalCache has a nasty synchronization bottleneck. All of the public methods on GlobalCache are synchronized. If there are a lot of instances being checked for expiration, and most of them are valid, the logic that involves iterating through all instances to find the invalid ones might take a lot of time. And during this iteration, we have prevented all of the other threads from accessing the contents of GlobalCache.

We can't simply remove the synchronization. It's preventing a lot of threading errors. For example, if we don't synchronize the access methods with the expiration thread, the Vector might throw unexpected instances of ConcurrentModificationException. And since ConcurrentModificationException is an unchecked exception, the compiler won't warn us of this possibility.

There are two solutions to this problem. The first solution is to replace the instance of Vector with a sorted data structure. Here, for example, is a slightly different version of GlobalCache, which uses an instance of TreeSet instead.

package grosso.fourthexample;

import java.util.*;

/*
  GlobalCache.


  Slightly modified version of old GlobalCache. It uses a
  Treeset instead of a Vector, to avoid iterating
  through everything. And it eliminates the synchronization
  bottleneck too !
*/


public class GlobalCache {
  private static Hashtable<Object, ExpirableObject> _cachedObjects =
    new Hashtable<Object, ExpirableObject>();
  private static SortedSet<ExpirableObject> _timeStampedStore = new TreeSet<ExpirableObject>();

  private static CacheExpirationThread _cacheExpirationThread;

  private GlobalCache() {}  // No instances allowed.

  static {
    _cacheExpirationThread = new CacheExpirationThread();
    _cacheExpirationThread.start();
  }
  
  public synchronized static void add(ExpirableObject expirableObject) {
    Object key = expirableObject.getKey();
    if (_cachedObjects.containsKey(key)) {
      return;
    }
    _timeStampedStore.add(expirableObject);  
    _cachedObjects.put(key, expirableObject);  
  }

  public static synchronized void remove(ExpirableObject expirableObject) {
    _timeStampedStore.remove(expirableObject);
    _cachedObjects.remove(expirableObject.getKey());
  }

  public synchronized static ExpirableObject get(Object key) {
    return _cachedObjects.get(key);
  }

  public synchronized static void halt() {
    _cachedObjects.clear();
    _timeStampedStore.clear();
    _cacheExpirationThread.halt();
  }

  protected static void expireObjects() {
    while (removeOldestObject()) { }
  }

  private static boolean removeOldestObject() {
    ExpirableObject oldestObject = getOldestObject();;
    if (oldestObject.shouldExpire()) {
      remove(oldestObject);
      expire(oldestObject);
      return true;  
    }
    return false;
  }

  private static synchronized ExpirableObject getOldestObject() {
    return _timeStampedStore.last();
  }

  private static void expire(ExpirableObject expirableObject ) {
    try {
      expirableObject.expire();  
    }
    catch (Throwable t) {
      t.printStackTrace();
    }
  }
}

In order to make this approach work, we needed to change ExpirableObject slightly. The first thing that had to change was that ExpirableObject needs to implement the Comparable interface. In order to do this, the following code was added:

  /*
    Note that ExpirableObjects which expire later this object are
    "less than" this object.
  */
  
  public int compareTo(Object object) {
    if (! (object instanceof ExpirableObject)) {
      return compareToBasedOnHashcodes(object);
    }
    ExpirableObject otherExpirableObject = (ExpirableObject ) object;
    int returnValue = compare(_expirationTime, otherExpirableObject._expirationTime);
    if (0 != returnValue) {
      return returnValue;
    }
    return compareToBasedOnHashcodes(object);
  }

  private int compareToBasedOnHashcodes(Object object) {
    return compare(hashCode(), object.hashCode());  
  }

  private int compare(long firstValue, long secondValue) {
    if (firstValue < secondValue) {
      return 1;
    }
    if (secondValue < firstValue) {
      return -1;
    }
    return 0;
  }

This version of GlobalCache works because TreeSet stores objects in a sorted order. When an instance is added to the cache, it is added to the "correct" location in the tree, based on the time the instance will expire (instances which will expire sooner are greater than instances which expire later). This all means that our background thread now follows a very simple algorithm:

  1. Look at the last instance in the tree. If the last instance needs to be expired, expire it and perform this step again.
  2. Fall asleep.

This does acquire the lock for the data store quite often, but for very short periods of time each time. And since the expiration thread is a low-priority thread, we've effectively eliminated the synchronization bottleneck.

Unfortunately, as written, this approach is not a general solution for data expiration. The problem is that the sorting used by the treeset assumes that the comparison operation is constant (that is, once the comparison operation returns "instance1 is less than instance 2," it will always return "instance1 is less than instance 2"). If we ever call setExpiration on an object that's already contained in the cache, we might violate this assumption. So we also need to change the implementation of setExpiration to remove and then reinsert the object, as follows:

  public void setExpiration(long timeToLive) {
    GlobalCache.remove(this);
    _expirationTime = System.currentTimeMillis() + timeToLive;
    GlobalCache.add(this);
  }

This is very simple and easy to do. But it can be expensive. Suppose, for example, you're writing a Web server and dealing with 1,000 simultaneous sessions. Every time a request comes in, you need to call setExpiration to reset the expiration time for the session. This means that, on every request, you're doing approximately 20 comparisons between instances of ExpirableObject (since TreeSet is a red-black tree, the number of comparisons needed to remove or insert an object is approximately the same as for a b-tree). If Web browsers are requesting pages at the rate of one every thirty seconds, you're paying 40,000 object comparisons per minute to maintain the TreeSet

On the other hand, if you're not updating the expiration time, as in our weather example (or, more generally, for short-lived information), then you can simply remove the setExpiration method from ExpirableObject and this is a good solution.

Pages: 1, 2, 3, 4

Next Pagearrow