Software Infrastructure Bottlenecks in J2EE
by Deepak Goel01/19/2005
Scalability is one of the most important non-functional requirements of a system. But there could be several bottlenecks within a system, which might prevent it from being scalable. In this article, we try to analyze the case in which the software infrastructure becomes a bottleneck, long before any of the hardware resources (such as CPU, memory, disk space, and network speed) are fully consumed. This is a tricky problem whose solution is explored below.
Here are definitions of a few terms that will be used throughout the article:
- Throughput: The number of transactions per second supported by the system.
- Service demand: The utilization of the particular hardware resource per transaction. Service demand equals hardware utilization divided by throughput.
- Hardware resources/pipe: Hardware resources (or the hardware pipe) are the processors, memory, disk, and network.
- Software resources/pipe: Software resources (or the software pipe) are resources like web threads, executive threads, bean pools, database connection pools, etc.
- Think time: The time taken by the user to think between submitting two concurrent requests to the system.
- Little's Law: A simple law to validate the test and ensure that the testing-tool environment is not a bottleneck.
- Response time: The time taken by the customer to get an answer back once he has submitted a request.
Theory
Any system running an J2EE application has the following layers, as seen in Figure 1:
- Hardware infrastructure resources (CPU, memory, disk, network)
- Software infrastructure resources (JVM, web servers, application servers, database servers)
- Software application (J2EE application)

Figure 1. Snapshot of a J2EE system
There can be two possibilities that lead to bottlenecks: the hardware becomes the bottleneck before the software, or the software becomes the bottleneck before the hardware. In the first case, the hardware resources are inadequate and the software resources are ample, as seen in Figure 2. As the load increases, hardware resources become a bottleneck, yet the software scales up. The solution to alleviate this bottleneck normally is scale up or scale out the hardware.

Figure 2. The hardware pipe becomes a bottleneck
In the second case, hardware resources are plentiful and software resources are limited. As load increases, the hardware resources scale up, while the software becomes a bottleneck. This situation is seen in Figure 3. The solution to alleviate this bottleneck normally is to use software clusters or tune the software.

Figure 3. Software pipe becomes a bottleneck
How Does an Application Server Work?
It would be interesting to consider what's inside of the application server and how it works. Some of the basic functionalities of an application server are transaction management, data persistence, object pooling, socket handling, and request handling. The flow between these responsibilities is seen in Figure 4. There are various components that handle these functionalities. These components need to synchronize the threads within the application server, in order to manage and maintain the sanctity of data and operations on them. This synchronization, although beneficial and necessary for the proper functioning of the application server, acts as a limitation at higher loads, even when there are enough hardware resources.

Figure 4. Internals of an application server
Experiments
To understand bottleneck situations, experiments were done with the Java PetStore application sitting on the popular J2EE application server on the Windows/Intel platform. A few use cases, like the browsing and the shopping cycles of the PetStore application, were used to test for scalability. It was ensured that the entire environment, including the operating system, JVM, application server, and the application, was tuned as optimally as possible. It was ensured that the J2EE application did not have any bottleneck, nor any synchronization problems. A multiuser load was fired and the response times, throughput, and resource utilizations were observed for these tests.
The environment used in this experiment is as follows:
- J2EE PetStore application
- J2EE application server
- Sun JVM 1.3
- Windows 2000 Advanced Server
- Intel Dell PowerEdge 8450 (eight Intel Xeon 800MHz processors, 4GB RAM)
- 100Mbps Cisco dedicated network
- The load-testing tool WebLoad
The results for the PetStore application with one application server instance are shown in the following table.
| Users | Test duration | App. server utilization | DB server utilization | Response times | Think time | Throughput | Little's Law validation | App. server service demand | DB server service demand |
|---|---|---|---|---|---|---|---|---|---|
| 2 | 1000 | 13.35% | 9.76% | 0.188 | 0 | 10.36 | 1.9477 | 0.01290 | 0.00941 |
| 4 | 1000 | 26.50% | 16.29% | 0.217 | 0 | 17.9 | 3.8843 | 0.01493 | 0.00909 |
| 6 | 1000 | 36.10% | 13.72% | 0.266 | 0 | 21.9 | 5.8284 | 0.0165 | 0.00626 |
| 8 | 1000 | 36.50% | 16.89% | 0.352 | 0 | 22 | 7.744 | 0.01679 | 0.00767 |
Editor's note: values in the "Application Server Service Demand" column were incorrect as originally posted and have been corrected.
What we see in this test is that even though there are ample hardware resources, the application server instance is limited in its ability to scale up. The software resources (the execution threads, bean pool size, database pools, and other parameters within the application server) were tuned so that the lack of these resources do not limit the system from scaling up. Below we explore some of the solutions to alleviating this problem.
Note: The Sun J2EE PetStore was further tuned to improve its performance and scalability.
Solutions
Cluster on the Same Box
When the throughput saturated with one instance of the application server with increasing load, another instance was added to the same box to alleviate the problem. This arrangement is illustrated in Figure 5.

Figure 5. Instance clusters on the same hardware box
The CPU utilization was only around 40 percent on the current physical box leaving ample room for another instance. It was found that after adding one more instance, throughput increased by at least 50 percent, as seen in Table 2.
| Users | App. server utilization | DB server utilization | Response times | Think time | Throughput | Little's Law validation | App. server service demand | DB server service demand |
|---|---|---|---|---|---|---|---|---|
| 2 | 13.38% | 9.34% | 0.185 | 0 | 10.36 | 1.92 | 0.01 | 0.01 |
| 4 | 25.30% | 16.43% | 0.218 | 0 | 17.9 | 3.90 | 0.01 | 0.01 |
| 8 | 35.00% | 16.90% | 0.362 | 0 | 22 | 7.96 | 0.02 | 0.01 |
| 16 | 70.30% | 22.85% | 0.444 | 0 | 31.77 | 14.11 | 0.02 | 0.01 |
| 20 | 71.30% | 24.10% | 0.602 | 0 | 32.36 | 19.48 | 0.02 | 0.01 |
Cluster on Different Boxes
When the throughput saturated one instance of the application server with increasing load, it was found that the CPU utilization was only around 40 percent on the physical box. Since the eight-CPU box was underutilized, test were done again with a box of a lower capacity. Two four-CPU boxes were used instead of an eight-CPU box, as seen in Figure 6.

Figure 6. Instance clusters on different hardware boxes
It was found that utilization of the four-CPU box went up to 80 percent, leaving no room for another instance. To increase the throughput, one additional four-CPU box was added to the system with an application server instance on it. After adding one more box, the throughput nearly doubled. It was ensured in all of the tests that the database machine did not become a bottleneck.
Note: In the above test with two boxes, the load balancing was done in such a way that the load balancing imposed no overhead on the functioning of the application server instances or the results. However, this might not be possible to do in an actual production environment.
Since we earlier observed that the eight-CPU box was not completely utilized, we took a system with four processors and did the test again. The results of the tests can be seen in the tables below, which represent boxes 1 and 2, respectively. The four-CPU box is almost completely utilized, which implies that two four-CPU boxes should be used rather than two eight-CPU boxes, since the former arrangement is more cost-effective.
| Users | App. server utilization | DB server utilization | Response times | Think time | Throughput | Little's Law validation | App. server service demand | DB server service demand |
|---|---|---|---|---|---|---|---|---|
| 2 | 13.11% | 9.33% | 0.187 | 0 | 10.11 | 1.89 | 0.01 | 0.01 |
| 4 | 26.44% | 16.10% | 0.21 | 0 | 17.65 | 3.71 | 0.01 | 0.01 |
| 6 | 36.30% | 13.30% | 0.255 | 0 | 21.8 | 5.56 | 0.02 | 0.01 |
| 8 | 36.40% | 16.75% | 0.348 | 0 | 22.1 | 7.69 | 0.02 | 0.01 |
| Users | App. server utilization | DB server utilization | Response times | Think time | Throughput | Little's Law validation | App. server service demand | DB server service demand |
|---|---|---|---|---|---|---|---|---|
| 2 | 13.21% | 9.45% | 0.187 | 0 | 10.19 | 1.91 | 0.01 | 0.01 |
| 4 | 26.53% | 11.40% | 0.21 | 0 | 17.55 | 3.69 | 0.02 | 0.01 |
| 6 | 36.45% | 13.40% | 0.255 | 0 | 21.1 | 5.38 | 0.02 | 0.01 |
| 8 | 37.40% | 16.80% | 0.348 | 0 | 22.96 | 7.99 | 0.02 | 0.01 |
Conclusion
These experiments have shown that the software infrastructure, like the application server instance, can become a bottleneck, and that some of the solutions to alleviate this problem include clustering on the same or different boxes. This needs to be taken into account for any capacity-planning or sizing initiatives for a J2EE application, since it has a direct bearing on the scalability of the application. This idea is important and I will round it off by presenting it in a dialogue.
Project manager: What you're are trying to say is that the application server, which represents the software infrastructure, can become a bottleneck in your system.
Performance architect: That's right.
Project manager: Why don't we see this scenario very often?
Performance architect: Well, sometimes, the hardware resources or the software application built by us becomes a bottleneck first. The software infrastructure (application server) rarely gets to show its full scalability characteristics.
Project manager: That's true. So the way out of this bottleneck is to have clusters in your system. If there are sufficient hardware resources, then, put instance clusters on the same box or set up a cluster on multiple boxes.
Performance architect: Right! And that is what we have explored in this article.
Deepak Goel is presently tinkering on a product in the artificial intelligence space.
Return to the ONJava.com
You must be logged in to the O'Reilly Network to post a talkback.
Showing messages 1 through 6 of 6.
-
Few thoughts and answers
2005-01-23 01:19:53 DeepakGoel [Reply | View]
Please find below my thoughts to the discussions on this article and also on offline queries received by me on my email...
1. Experiments were done with without EJB's too. Following were some of the experiments:
a. Plain JSP's
b. JSP's - DAO - Database
c. JSP's - Session Bean - DAO - Database
Results found were similar to the one's in the article. The application server became a bottleneck much before the hardware resources were fully utilized.
2. Little's Law Validation
Littles Law states that:
Number of Users pumped in the load testing tool = Throughput (Response Time + Think Time)
If the L.H.S is equal to the R.H.S then the test is ok. This is a good law to verify that the load testing equipment is not a bottleneck.
There can be times when the load testing tool does not fire the load input by the user. For example if one inputs a 1000 user load and for some reason the load testing tool can only fire 100 users (however it will display 1000 users) , then the L.H.S in the above equation would never be equal to the R.H.S and this would indicate a bottleneck in the load testing tool.
L.H.S = No. Of Users = 1000
R.H.S would be the product of the throughput and response time and will be approximately equal to 100.
Therefore
L.H.S is not equal to R.H.S
3. Load Balancing
The trick used to ensure that the load balancing does not add an overhead to the test was as follows:
Two client machines were used to fire up the load. In the load testing script the dns name was used (For example www.infosys.com). Then the host file in the two machines were changed to ensure that the dns name 'www.infosys.com'pointed to two different ip address.
In the first client machine www.infosys.com pointed to 172.25.203.149
In the second client machine www.infosys.com pointed to 172.25.203.148
And in the load testing tool the load to be fired by these two machines was given to be equal.
This ensured that the load received by the two instances was equal and it did not require an actual load balancing setup. However this might not be possible in an actual production environment.
4. Service Demand.
Service Demand = Utilization divided by the throughput.
Thus service demand is the utilization of an resource per transaction.
i.e.. Application Server Service Demand = Application Server Utilization divided by Throughput
Database Server Service Demand = Database Server Utilization divided by Throughput
Normaly the service demand should remain more or less the same with increasing load. This ensures that the environment is scaling pretty well. However if the service demand increases, this indicated that the environment is not tuned and hence is not scaling up and hence indicates an bottleneck in the environment.
Cheers,
Deepak Goel.
-
Similar results
2005-01-20 12:13:09 whartung [Reply | View]
We found similar results in some of our testing. We were loading the server and running tests by giving the server varying numbers of CPUs. We found, simply, that while muliple CPU's gave us more throughput, it didn't scale as well as it could have, and that, for example, running two instances on 2 CPU's each of a 4 CPU machine gave us better utilization than one instance on the same 4 CPU machine.
There were obviously some internal bookkeeping or synchronization issues within the application server that was holding the overall application back (we came to refer to this unseen overhead as "Dark Matter", as it was pretty well impossible to actually measure it, but easy to see the results of its effects).
-
I didn't quite get the data
2005-01-20 11:07:17 roger_rustin [Reply | View]
" It was found that after adding one more instance, throughput increased by at least 50 percent, as seen in Table 2." How?
In the first table there are only 2,4,6&8 users. If I compare the data for these users to table2 I get the same value.
-
Hmm.
2005-01-20 14:45:13 cekvenich2 [Reply | View]
I have done many years of large scale development and testing and we find loads in LAN and DB.
Maybe you are using EJB?
Else... app is never slow.
ANd we do generaly have many app servers, but ... something is wrong w/ your test I think.
Could you post more details?
tia,
.V





Yes, to some point you can fix these application problems by cloning infrastructure components, but these problems usually propagate to a back-end in the form of database locks or transaction locks and this would be better debugged beforehand.