ONJava.com -- The Independent Source for Enterprise Java
oreilly.comSafari Books Online.Conferences.


AddThis Social Bookmark Button

Clustering and Load Balancing in Tomcat 5, Part 1
Pages: 1, 2

Fault Tolerance

Fault tolerance is the system's ability to allow a computation to fail over to another available server if a server in the cluster goes down, as transparently to the end user as possible. An ideal fail over scenario is that the cluster service should detect when a server instance is no longer available to take any requests, and stop sending requests to that instance. It should also periodically check to see if a cluster member is available again and, if so, automatically add it to the pool of active cluster nodes.

Fault Tolerance in Tomcat

Tomcat 5 does not provide a built-in fail over mechanism to detect when a cluster member crashes. Hopefully, a future version of Tomcat will provide the fail over feature that can be used to find the availability of a specific cluster member to make sure it's ready to service incoming web requests.

There are two levels of fail over capabilities typically provided by clustering solutions:

  • Request-level fail over: If one of the servers in the cluster goes down, all subsequent requests should be redirected to the remaining servers in the cluster. This involves using a heartbeat mechanism to keep track of the server status and to avoid sending requests to the servers that are not responding. In our cluster setup, a Tomcat instance acting as a load balancer takes care of request level fail over by forwarding web requests to another node in the cluster.

  • Session-level fail over: A web client can have a session that is maintained by the HTTP server. In session-level fail over, if one of the servers in the cluster goes down, another server in the cluster should be able to carry on with the sessions that were being handled by the first server, with minimal loss of continuity. This involves replicating the session data across the cluster. A Tomcat cluster with session replication capability takes care of session-level fail over.

Session State Persistence

Fail over and load balancing require the session state to be replicated at different servers in a cluster. Session state replication allows a client to seamlessly get session information from another server in the cluster when the original server, on which the client established a session, fails. The state can be system state and/or application state (application state contains the objects and data stored in an HTTP session). The main goal of session replication is not to lose any session details if a cluster member crashes or is stopped for application updates or system maintenance.

As far as session persistence is concerned, clustering can be a simple scenario in which a cluster member doesn't have any knowledge of session state in the other cluster members. In this scenario, the user session lives entirely on one server, selected by the load balancer. This is called a sticky session (also known as session affinity), since the session data stays in the cluster member that received the web request.

On the other hand, the cluster can be implemented in such a way that each cluster member is completely aware of session state in other cluster members, with the session state periodically propagated to all (or preferably, one or two) backup cluster members. This type of session is known as a replicated session.

There are three ways to implement session persistence:

  • Memory-to-memory replication.
  • File System session persistence, where session information is written to and read from a centralized file system.
  • Database session persistence, where session data is stored in a JDBC data store.

In memory session replication, the individual objects in the HttpSession are serialized to a backup server as they change, whereas in database session persistence, the objects in the session are serialized together when any one of them changes.

The main drawback of database/file system session persistence is limited scalability when storing large or numerous objects in the HttpSession. Every time a user adds an object to the HttpSession, all of the objects in the session are serialized and written to the database or shared file system.

Session Replication in Tomcat

Session replication in the current version of Tomcat server is an all-to-all replication of session state, meaning the session attributes are propagated to all cluster members all the time. This algorithm is efficient when the clusters are small. For large clusters, the next Tomcat release will support primary-secondary session replication, where the session will only be stored at one or maybe two backup servers.

There are three types of session replication mechanisms in Tomcat:

  • Using in-memory replication, with the SimpleTcpCluster (in the org.apache.catalina.cluster.tcp package) that ships with Tomcat 5 (in server/lib/catalina-cluster.jar).
  • Using session persistence, and saving the session to a shared database (org.apache.catalina.session.JDBCStore).
  • Saving the session state to a shared file system (org.apache.catalina.session.FileStore, part of catalina-optional.jar).

Factors to Consider in Implementing a J2EE Cluster

There are many factors to take into account when designing a J2EE cluster. The following is a list of questions to be considered in a large-scale J2EE system design. (This list is taken from "Creating Highly Available and Scalable Applications Using J2EE" in the EJB Essentials Training document.)


  • What kind of clustering should be implemented: vertical scaling or horizontal scaling?
  • In what tier should clustering be implemented: web server or servlet container for servlets, JSP, and HTTP session objects; or application server for EJB, JMS, and JNDI objects or database clustering?

Load Balancing

  • When is a server selected (i.e. affinity): every request, every transaction, or every session?
  • How is a server selected (i.e. load balancing policy): randomly, round-robin, weight-based, least loaded server, or by the application?
  • Where is load balancing accomplished: in one place or many, at the client or at the server?

Fault Tolerance

  • How are server failures detected?
  • When is it right time to fail over and try another server?
  • What about system and application state at the failed node?

Session State Persistence

  • How is state communicated?
  • How often is it communicated?
  • How is object state materialized?
  • Is the state persistence mechanism efficient?
  • Consistency of replicated state?
  • Any network constraints in replicating the session state?

Proposed Cluster Setup

Listed below are the main objectives I wanted to accomplish in the proposed cluster environment:

  • The cluster should be highly scalable.
  • It should be fault-tolerant.
  • It should be dynamically configurable, meaning it should be easy to manage the cluster declaratively (changing a configuration file) rather than programmatically (changing Java code).
  • It should provide automatic cluster member discovery.
  • It should have fail over and load-balancing features for session data with in-memory session state replication.
  • It should have pluggable/configurable load-balancing policies.
  • It should perform group membership notification when a member of the cluster joins or leaves a group.
  • There should be no loss of message transmission through multicast.
  • Clustering should be seamless to the web application and the server. It should provide both client and server transparency. Client transparency means that the client is not aware of clustered services or how the cluster is set up. The cluster is identified and accessed as a single thing, rather than as individual services. Server transparency means that the application code in a server is not aware that it's in a cluster. The application code cannot communicate with the other members of cluster.


In part two of this article, we'll look at how to deploy a cluster (by running multiple Tomcat server instances) to achieve these goals. We will discuss the cluster architecture and configuration details to enable session replication in Tomcat 5.

Srini Penchikala is an information systems subject matter expert at Flagstar Bank.

Return to ONJava.com.