ONJava.com -- The Independent Source for Enterprise Java
oreilly.comSafari Books Online.Conferences.

advertisement

AddThis Social Bookmark Button

Software Infrastructure Bottlenecks in J2EE Software Infrastructure Bottlenecks in J2EE

by Deepak Goel
01/19/2005

Scalability is one of the most important non-functional requirements of a system. But there could be several bottlenecks within a system, which might prevent it from being scalable. In this article, we try to analyze the case in which the software infrastructure becomes a bottleneck, long before any of the hardware resources (such as CPU, memory, disk space, and network speed) are fully consumed. This is a tricky problem whose solution is explored below.

Here are definitions of a few terms that will be used throughout the article:

  • Throughput: The number of transactions per second supported by the system.
  • Service demand: The utilization of the particular hardware resource per transaction. Service demand equals hardware utilization divided by throughput.
  • Hardware resources/pipe: Hardware resources (or the hardware pipe) are the processors, memory, disk, and network.
  • Software resources/pipe: Software resources (or the software pipe) are resources like web threads, executive threads, bean pools, database connection pools, etc.
  • Think time: The time taken by the user to think between submitting two concurrent requests to the system.
  • Little's Law: A simple law to validate the test and ensure that the testing-tool environment is not a bottleneck.
  • Response time: The time taken by the customer to get an answer back once he has submitted a request.

Theory

Any system running an J2EE application has the following layers, as seen in Figure 1:

  1. Hardware infrastructure resources (CPU, memory, disk, network)
  2. Software infrastructure resources (JVM, web servers, application servers, database servers)
  3. Software application (J2EE application)


Figure 1. Snapshot of a J2EE system

There can be two possibilities that lead to bottlenecks: the hardware becomes the bottleneck before the software, or the software becomes the bottleneck before the hardware. In the first case, the hardware resources are inadequate and the software resources are ample, as seen in Figure 2. As the load increases, hardware resources become a bottleneck, yet the software scales up. The solution to alleviate this bottleneck normally is scale up or scale out the hardware.


Figure 2. The hardware pipe becomes a bottleneck

In the second case, hardware resources are plentiful and software resources are limited. As load increases, the hardware resources scale up, while the software becomes a bottleneck. This situation is seen in Figure 3. The solution to alleviate this bottleneck normally is to use software clusters or tune the software.


Figure 3. Software pipe becomes a bottleneck

How Does an Application Server Work?

It would be interesting to consider what's inside of the application server and how it works. Some of the basic functionalities of an application server are transaction management, data persistence, object pooling, socket handling, and request handling. The flow between these responsibilities is seen in Figure 4. There are various components that handle these functionalities. These components need to synchronize the threads within the application server, in order to manage and maintain the sanctity of data and operations on them. This synchronization, although beneficial and necessary for the proper functioning of the application server, acts as a limitation at higher loads, even when there are enough hardware resources.


Figure 4. Internals of an application server

Experiments

To understand bottleneck situations, experiments were done with the Java PetStore application sitting on the popular J2EE application server on the Windows/Intel platform. A few use cases, like the browsing and the shopping cycles of the PetStore application, were used to test for scalability. It was ensured that the entire environment, including the operating system, JVM, application server, and the application, was tuned as optimally as possible. It was ensured that the J2EE application did not have any bottleneck, nor any synchronization problems. A multiuser load was fired and the response times, throughput, and resource utilizations were observed for these tests.

The environment used in this experiment is as follows:

  1. J2EE PetStore application
  2. J2EE application server
  3. Sun JVM 1.3
  4. Windows 2000 Advanced Server
  5. Intel Dell PowerEdge 8450 (eight Intel Xeon 800MHz processors, 4GB RAM)
  6. 100Mbps Cisco dedicated network
  7. The load-testing tool WebLoad

The results for the PetStore application with one application server instance are shown in the following table.

Users Test duration App. server utilization DB server utilization Response times Think time Throughput Little's Law validation App. server service demand DB server service demand
2 1000 13.35% 9.76% 0.188 0 10.36 1.9477 0.01290 0.00941
4 1000 26.50% 16.29% 0.217 0 17.9 3.8843 0.01493 0.00909
6 1000 36.10% 13.72% 0.266 0 21.9 5.8284 0.0165 0.00626
8 1000 36.50% 16.89% 0.352 0 22 7.744 0.01679 0.00767

Editor's note: values in the "Application Server Service Demand" column were incorrect as originally posted and have been corrected.

What we see in this test is that even though there are ample hardware resources, the application server instance is limited in its ability to scale up. The software resources (the execution threads, bean pool size, database pools, and other parameters within the application server) were tuned so that the lack of these resources do not limit the system from scaling up. Below we explore some of the solutions to alleviating this problem.

Note: The Sun J2EE PetStore was further tuned to improve its performance and scalability.

Solutions

Cluster on the Same Box

When the throughput saturated with one instance of the application server with increasing load, another instance was added to the same box to alleviate the problem. This arrangement is illustrated in Figure 5.


Figure 5. Instance clusters on the same hardware box

The CPU utilization was only around 40 percent on the current physical box leaving ample room for another instance. It was found that after adding one more instance, throughput increased by at least 50 percent, as seen in Table 2.

Users App. server utilization DB server utilization Response times Think time Throughput Little's Law validation App. server service demand DB server service demand
2 13.38% 9.34% 0.185 0 10.36 1.92 0.01 0.01
4 25.30% 16.43% 0.218 0 17.9 3.90 0.01 0.01
8 35.00% 16.90% 0.362 0 22 7.96 0.02 0.01
16 70.30% 22.85% 0.444 0 31.77 14.11 0.02 0.01
20 71.30% 24.10% 0.602 0 32.36 19.48 0.02 0.01

Cluster on Different Boxes

When the throughput saturated one instance of the application server with increasing load, it was found that the CPU utilization was only around 40 percent on the physical box. Since the eight-CPU box was underutilized, test were done again with a box of a lower capacity. Two four-CPU boxes were used instead of an eight-CPU box, as seen in Figure 6.


Figure 6. Instance clusters on different hardware boxes

It was found that utilization of the four-CPU box went up to 80 percent, leaving no room for another instance. To increase the throughput, one additional four-CPU box was added to the system with an application server instance on it. After adding one more box, the throughput nearly doubled. It was ensured in all of the tests that the database machine did not become a bottleneck.

Note: In the above test with two boxes, the load balancing was done in such a way that the load balancing imposed no overhead on the functioning of the application server instances or the results. However, this might not be possible to do in an actual production environment.

Since we earlier observed that the eight-CPU box was not completely utilized, we took a system with four processors and did the test again. The results of the tests can be seen in the tables below, which represent boxes 1 and 2, respectively. The four-CPU box is almost completely utilized, which implies that two four-CPU boxes should be used rather than two eight-CPU boxes, since the former arrangement is more cost-effective.

Users App. server utilization DB server utilization Response times Think time Throughput Little's Law validation App. server service demand DB server service demand
2 13.11% 9.33% 0.187 0 10.11 1.89 0.01 0.01
4 26.44% 16.10% 0.21 0 17.65 3.71 0.01 0.01
6 36.30% 13.30% 0.255 0 21.8 5.56 0.02 0.01
8 36.40% 16.75% 0.348 0 22.1 7.69 0.02 0.01
Users App. server utilization DB server utilization Response times Think time Throughput Little's Law validation App. server service demand DB server service demand
2 13.21% 9.45% 0.187 0 10.19 1.91 0.01 0.01
4 26.53% 11.40% 0.21 0 17.55 3.69 0.02 0.01
6 36.45% 13.40% 0.255 0 21.1 5.38 0.02 0.01
8 37.40% 16.80% 0.348 0 22.96 7.99 0.02 0.01

Conclusion

These experiments have shown that the software infrastructure, like the application server instance, can become a bottleneck, and that some of the solutions to alleviate this problem include clustering on the same or different boxes. This needs to be taken into account for any capacity-planning or sizing initiatives for a J2EE application, since it has a direct bearing on the scalability of the application. This idea is important and I will round it off by presenting it in a dialogue.

Project manager: What you're are trying to say is that the application server, which represents the software infrastructure, can become a bottleneck in your system.

Performance architect: That's right.

Project manager: Why don't we see this scenario very often?

Performance architect: Well, sometimes, the hardware resources or the software application built by us becomes a bottleneck first. The software infrastructure (application server) rarely gets to show its full scalability characteristics.

Project manager: That's true. So the way out of this bottleneck is to have clusters in your system. If there are sufficient hardware resources, then, put instance clusters on the same box or set up a cluster on multiple boxes.

Performance architect: Right! And that is what we have explored in this article.

Deepak Goel is presently tinkering on a product in the artificial intelligence space.


Return to the ONJava.com