Monitoring Session Replication in J2EE Clusters Monitoring Session Replication in J2EE Clusters

by Fermin Castro

Clustering has become one of the most common and most emphasized features of J2EE application servers, because, along with load balancing, it is the fundamental element on which the scalability and reliability of an application server rely.

Nowadays, J2EE clustering is generally associated with the mechanism used to replicate the information stored in a session across different instances of an application server. Whenever a failure happens in a stateful application, a cluster allows the failover to occur in a way that is transparent to the incoming requests: application servers take care of replicating the session information to the participants in the cluster.

Each application server allows you to customize different aspects of a cluster. Two examples are the replication scope ("How many nodes do I want to share my session across?") and the replication trigger ("When do I want to replicate the information?"). However, once a cluster is up and running, it is usually difficult to analyze its behavior. The reason for this is that application servers use low-level APIs to implement their clusters and do not expose any metrics about what's going on underneath the application's code.

This lack of information becomes particularly relevant when the environment around a deployment changes. There are different alterations that can jeopardize the correct behavior of a J2EE cluster. A very common example: we may have done a great job configuring and tuning our cluster, but find out that everything stops working correctly as soon as other stateful applications are deployed on the same resources.

This article provides a tool for obtaining metrics for the most-common processes involved in an application server cluster: serialization and IP multicast. It also shows how to interpret those metrics and identify potential issues in a cluster. Finally, it briefly presents some hints for solving those problems.

Session Replication at a Glance

Three main processes are involved in synchronizing a session among the nodes in a cluster:

  1. The objects managed in the session are "flattened" to be sent across the network.
  2. The "flattened" objects are sent through the network.
  3. The "flattened" objects are restored to their code-manageable state.

The "flattening" and restoration are done the same way in all application servers through a specific standard API called Object Serialization. Object serialization is the process of saving an object's state to a sequence of bytes. Deserialization is the process of rebuilding those bytes into a live object at some future time. The Java Serialization API provides a standard mechanism for developers to handle object serialization. As required in the Servlet 2.3 specification, the "container" (the application server) must, within an application marked as distributable, be able to handle all objects placed into instances of the HttpSession and accept all objects that implement the serializable interface. This sequence of bytes into which objects are transformed is what the application server instances send to each other.

As for the way objects travel across the network, most J2EE application servers (such as JBoss, Oracle, and Borland) use IP multicast communication. Multicast is based on the concept of a network group. An arbitrary group of receivers expresses an interest in receiving a particular data stream. Nodes that are interested in receiving data flowing to a particular group must use a common multicast address. This enables application servers to extend and diminish their clusters in a decoupled manner (without a requirement to maintain a list of nodes in their cluster): "If you want to have my state, this is my multicast address." Figure 1 shows several servers participating in such a multicasting arrangement.

Figure 1
Figure 1. Session replication process in application servers

Monitoring Session Replication

After the previous explanation, you soon realize that the replication process affects your environment in two ways:

  1. It affects your network.
  2. It increases the overhead caused by the serialization and deserialization of objects.

The severity of the impact on your network depends on:

The serialization and deserialization process is an I/O operation and, as such, involves the consumption of resources in every application server instance where it takes place. Fortunately, the application server takes care of making it as efficient as possible. However, depending on the size of the session objects, this I/O operation needs more or less RAM.

To minimize both issues, most application servers do not replicate the entire session in every request and they send only deltas (changes between each session modification). However, the first replication always involves a complete serialization-deserialization and a complete network trip for the objects in the session. Wouldn't it be nice to have a tool to analyze the possible effects of this overhead?

The Architecture of the Replication Latency Tester

To monitor the overhead created by session replication, I've written a Java application, the Replication Latency Tester, that measures the latency between two multicast nodes. It can provide you with some basic information (whenever multicast is used) for verifying that things are working properly.

Here is how the application works: it sends and receives serialized objects across the network, using IP multicast, and measures the time spent in this operation (the latency between nodes plus the overhead in the serialization/deserialization). It uses two threads, one for sending information (Node.class) and another for receiving (Client.class). The reason why the logic isn't separated in two programs (which may have been more clear) is that by using a single JVM for both processes, we are simulating a more realistic scenario. After all, application servers have to take care of the following in a single JVM:

  1. Generating information for their partners in a cluster (for the sessions they maintain).
  2. Receiving information from those partners (for the sessions other application server instances maintain).

The server thread sends a configurable number of objects of the type NodeInfo (NodeInfo.class). The larger the number of objects being sent by the server, the more meaningful the obtained metrics are. This is not only because, statistically, the average values are more representative, but also because, by adding more iterations, we expand the runtime period to a larger lapse of time and analyze the state of our cluster under different conditions.

The size of each object being sent through the network is randomized by the addition of a random StringBuffer attribute. Here is how the random-size attribute is built:

public StringBuffer createRandomSizeAttribute(){
    Random rnd = new Random(System.currentTimeMillis());
    int randInt = Math.abs(rnd.nextInt()) % 100;
    StringBuffer dev=new StringBuffer("big");
    for(int i=0;i<randInt;i++){
    return dev;

Random-size objects ensure that we cover different session scenarios: from small amounts of information per session (such as in an Internet bookstore where purchases rarely exceed a couple of items) to large amounts (such as in an Internet supermarket, where a whole shopping cart is maintained in memory).

Additionally, each random-size object contains other important attributes:

The creation time stamp is the base upon which the latency metrics are built. The server marks each object with its creation time and then sends it through IP multicast. When the object is rebuilt (on the client thread), this time stamp is used to calculate the delay in the operation. The time stamp plus the delay between each send operation and the next one are subtracted from the current time. Here is the code used to read the objects and to calculate the latency (the deserialization is done in a separate method). This code is run as a separate thread, thus the run signature.

public void run() {
   // Create the multicast elements
   MulticastSocket socket=new MulticastSocket(getPORT());
   InetAddress address=InetAddress.getByName(getMCAST());
   byte[] data = new byte[100000];
   DatagramPacket packet=new DatagramPacket(data, data.length);
   // Join the socket to the multicast adddress group
   Vector received= new Vector();
   // Loop on the socket and receive data
       //Variable definitions
       // Receive info
       // Deserialize object, cast to NodeInfo and calculate
       byte[] indata = packet.getData();
       NodeInfo receivedNodeInfo=(NodeInfo)deserializer(indata);
       statistics= new Statistics((double)totaldelay,ratio);

Figure 2 describes the architecture of the Replication Latency Tester.

Figure 2
Figure 2. Replication Latency Tester architecture

Running the Replication Latency Tester

You can download the complete tester program.

To use the Replication Latency Tester, simply execute java -jar ReplicationLatencyTester.jar in one of your nodes. The server thread and the client thread of the program will start to work with default values for the IP multicast address and IP multicast port. The server side of the program will start sending serialized objects, and the client side will start obtaining metrics based on the latency of the multicast network and the size of the object being sent.

An even more realistic scenario will be simulated if you start the tester on several nodes of your network. This will reproduce an architecture in which you have various application server instances running in different nodes and sending information to each other, using IP multicast. One Replication Latency Tester alone is not very significant; nor does it reflect a real-case scenario. (Replication shouldn't be activated when running a single instance of an application server; that would imply overhead without providing any higher availability.)

For each object received, these are the calculated values:

For each battery of tests:

You can persist these average metrics to a file simply by specifying the corresponding configuration parameter. The information will be stored in a file called lateststatistics.trc. Persisting this information will affect the average figures in a multi-node test, because this operation is done in the same thread as the serialization and deserialization and hence can introduce some time lag.

Both the server thread and the client thread can be configured with different parameters:

These files need to be placed in the directory where the tester is run. If these files are not found, the default values are used. These default values are specified in the class Constants.class.

These are the configurable parameters for the server thread:

Property Meaning Default Values
ipaddress Multicast address to be used to send information
port Port to be used to send information 2608
message Text message to be included in the object that will be serialized Hi there!

Delay applied between each send operation and the next one (msecs)

iterations Number of operations to be performed 100

And these are the configurable parameters for the client thread:

Property Meaning Default Values
ipaddress Multicast address to be used to send information
port Port to be used to send information 2608
verbose Print on standard error the detailed information for each object received false
dumptofile Write to the lateststatistics.trc file the statistics for each series of objects false

Without verbose output activation, the result should look something like the display in Figure 3:

Figure 3
Figure 3. Non-verbose output

With verbose output active, the standard output shows more detailed information as objects are being received, as seen in Figure 4:

Figure 4
Figure 4. Verbose output

Analyzing Replication Latency Tester Results

Several things need to be considered when you are analyzing the results obtained with the tester:

  1. Durations are measured using System.currentTimeMillis() and this approach may include much more in the metrics than just the code's execution time. Time taken by other processes on the system or time spent waiting for I/O can result in inaccurately high timing numbers. However, since this is only a two-thread program, this shouldn't affect our statistics much.

  2. The measurements of the latency between the client and the server assume that the clocks are set to the same values in every node in the test.

  3. In order to make the metrics as accurate as possible, the default behavior of the program is to show as little information as possible on the standard output (console). It is well known how System.out.println("") can affect the performance of a Java application.

  4. The size of the object is calculated by first serializing the object and then obtaining the size of the byte array created. Different approaches for obtaining the size of an object use the memory state by invoking Runtime.getRuntime().freeMemory(). I am not in favor of this approach, because different threads and the behavior of the garbage collector may affect this type of measurement.

The sizes of the objects being serialized in the tester are a good approximation of real session objects (sizes between 3Kb and 5Kb). Any objects that are much larger than that will imply considerable overhead.

As a rule of thumb, any average ratios that exceed 30 to 40 msecs/kb indicate that the nodes and the multicast network might be overloaded. You can extrapolate from these metrics and apply them to your application: If your session objects are 10Kb in size, you can expect that your application server will take more than 300 msecs (30 msecs/Kb*10Kb) to make the object available in other nodes (application servers do many more things simultaneously than what my application does, such as identifying the application, servlet and session for the object, of course). This implies that whenever your servlet/JSP engine is using multicast for replicating objects across application server instances, the objects will not appear on other nodes before 300 msecs have passed. Consequently, if a failover happens and the load balancing takes place in less than 300 msecs (which you can expect from most application servers), your client request will find an older version of the object in the new node and the application will become inconsistent.

As mentioned at the beginning of the article, most application servers do not replicate the entire session in every request and they send only deltas (they replicate changes to the session instead of the entire session). That saves time and resources for the subsequent modifications to become available in other application server instances. However, you always have to consider the worst-case scenario: a crash after the first serialization.

Problem-Solving Hints

You can do various things to solve a performance problem detected with the Replication Latency Tester tool. Each applies to a specific deployment, depending on the application involved and the resources available. Covering each one of them and when it is applicable is beyond the scope of this article. However as brief introduction to future analysis, here are the main possible solutions:

Obviously, all of these will affect your J2EE system (requiring either changes in code/configuration or additional resources) and need to be analyzed in detail before you decide which one applies to your case.


There are many reasons why the environment of a cluster may change through the lifecycle of an application. Thus, it is necessary to regularly monitor the cluster's proper functioning and prevent performance and stability issues. This article showed how to use a simple Java tool to obtain various metrics related to correct cluster behavior. It also discussed why it is important that serialization and IP multicast take place rapidly and covered how to monitor both. The next step: apply the necessary corrections once the problem is detected. But that's a subject that's sufficiently complex to merit its own article.


Fermin Castro is a product manager at Oracle Corporation.

Return to

Copyright © 2017 O'Reilly Media, Inc.