Scaling Enterprise Java on 64-bit Multi-Core X86-Based Servers
Pages: 1, 2
Optimize Your Code
In the previous sections, we discussed the general guidelines to build and run high-performance Java EE applications for multiple CPU and large memory servers. However, each application is unique with its own performance requirements and bottlenecks. The only way to make sure that your application is optimized for your hardware is through extensive performance testing. In this section, we cover some of the basic techniques to diagnose performance problems in your application.
It's beyond the scope of this article to cover performance-testing tools and frameworks. In our tests, we used Grinder, an open source performance-testing framework in Java. It can simulate hundreds of thousands of concurrent users across multiple testing computers and gather statistics on a central console. It provides a utility for you to record your test scripts by going through your web application in a browser. The generated script is in Jython, and you can easily modify it to suit your own needs.
As we discussed before, tuning GC operations in the JVM is crucial for performance. The easiest way to see the effects of various GC algorithm parameters is to monitor the time the application spends on GC throughout the load testing. There are two simple ways to do it.
You can add a
-verbose:gcstartup flag to the JVM. The JVM would print out every GC operation and its duration in the console. If the server pauses due to a long full GC operation, you can optimize the GC parameters accordingly. If the system runs lengthy full GC very frequently, you probably have a memory leak somewhere in your application.
If you use the JDK 5.0 JVM, you can also use the JConsole utility to monitor the server usage of resources. The JConsole GUI shows how various regions of the memory are utilized and how much time is spent on GC (see Figure 1). To use the JConsole, you need to start the JVM with the
-Dcom.sun.management.jmxremoteflag and then run the
jconsolecommand. JConsole can connect to a JVM running on the local computer, or to any JVM running on the network via RMI. You can use one JConsole instance to monitor multiple servers.
To pinpoint the exact location of a memory leak, you can use an application profiler. The JBoss Profiler is an open source profiler for applications inside the JBoss Application Server.
When the application is fully loaded, the CPU should run between 80% and 100% of its capacity. If the CPU usage is substantially lower, you should look for other bottlenecks, such as whether the network or disk I/O is saturated. However, an underutilized CPU could also indicate contention points inside the application. For instance, as we mentioned before, if there is a synchronized block on the critical path of multiple threads (e.g., a code block frequently accessed by most requests), the multiple CPUs would not be fully utilized. To find those contention points, you can do a thread dump when the server is fully loaded:
On a windows machine, type
Ctrl-Breakin the DOS terminal window where the server is started (i.e., the server console) to create a thread dump.
On a Linux/Unix system, run the
kill -QUIT process_idcommand, where
process_idis the ID of the server JVM process, to create a thread dump.
The thread dump prints out detailed information (stack trace with source code line numbers) about all current threads in the server. If all the request-handling threads are waiting at the same point, it would indicate a contention point, and you can go back to the code to fix it.
Sometimes, the contention point is not in the application but in the application server itself. Most Java EE application servers have not completely evolved their code base to take advantage of JDK 5.0 APIs, especially the concurrent utility libraries. In this case, it is crucial to choose an open source application server, such as the JBoss Application Server, where you can make changes to the server code.
Collapse the Tiers
Traditionally, Java EE had been designed for the multitiered architecture. This architecture envisions that the web server, servlet container, EJB server, and database server each runs on its own physical computer, and those computers are tied together through remote call protocols on the local network.
But with the new generation of more powerful server hardware, a single computer is powerful enough to run all those components for a medium-sized website. Running everything on the same physical machine is much more efficient than the distributed architecture described above. All communications are now interthread communications that can be handled efficiently by the same operating system or even inside the same JVM in many cases. It eliminates the expensive object serialization requirements and high network latency associated with remote calls. Furthermore, since different components tend to use different kind of server resources (e.g., the database is heavy on disk usage while Java EE is CPU-intensive), the integrated stack helps us to balance the server usage and reduce overall contention points.
Figure 2. Choose between call-by-reference and call-by-value in the JBoss AS installer (click for full-size image)
The JBoss Application Server has built-in optimizations to support the single JVM deployment. For instance, by default, JBoss AS makes call-by-reference method calls from the servlet to the EJB objects. Call-by-reference can be up to ten times faster than the standard Java EE call-by-value approach, because call-by-value requires object serialization and is primarily for remote calls across JVMs. Figure 2 shows that you can choose from the two call-isolation methods.
The JBoss Web Server project goes one step further and builds native Apache web server functionalities directly into the Java EE servlet container. It allows much tighter integration between components when deployed on the same server, and hence could deliver much better performance than older Java EE servers.
With the entire middleware stack running on the same physical server, we also drastically simplify deployment and management. When you need to scale the application up, you simply add a load balancer, move the shared database server to a different computer, and then add any number of server nodes with the integrated middleware stack (see Figure 3).
Figure 3. The load-balanced architecture
All web requests are made against the load balancer, which then forwards the requests to application servers in a manner that ensures all nodes have similar numbers of request per unit time. The load balancer should be configured to forward all requests from the same user session to the same node (i.e., use sticky session). Most Java EE application servers also support automatic state replication between the nodes to avoid application state loss when a node fails. For instance, the JBoss AS supports several state-replication strategies including buddy replication, in which each node can choose a "buddy" as its failover. Such load balancing and state replication would be very difficult to deploy and manage if we had three or four tiers of servers.
Virtualize the Hardware
The new multi-core 64-bit servers are capable of running heavy-load web applications. But for small web applications, they can be overkill. To fully utilize server capabilities, we sometimes run multiple small websites on the same physical server. Of course, you can deploy multiple applications on the same Java EE container. But to achieve the optimal performance, stability, and manageability, we often wish to run each application in its own Java EE container. How do we do that?
Technically, the primary challenge to run multiple Java EE server instances on the same physical server is to avoid the port conflicts. Like any other server application, the Java EE application server listens on TCP/IP ports to provide services. For instance, the HTTP service listens for web requests on the 80 or 8080 port; the RMI service listens for RMI invocation requests on the 4444 port; the naming service listens on the 1099 port; etc. For the server to run properly, it must obtain exclusive control over the port it listens to. So, when you start multiple server instances on the same computer, you are likely to get port conflict errors. There are three ways to avoid port conflicts.
Using virtualization software, such as VMware and Xen, you can run multiple operating systems on the same physical server. You can run a Java EE server instance in each of those guest OSes. The benefit of this approach is its flexibility in architecture and management. Each OS virtual machine can be independently provisioned, managed, and backed up. You can also choose to run the database server, the load balancer, or other server components in their own virtual OSes. However, the drawback of this approach is that there is relatively heavy overhead to virtualize the entire OS just to run the JVM.
Many Java EE servers allow you to start a server instance bound to a specific IP address. For instance, in the case of JBoss AS, you can start the server with the
run.bat -b 10.10.20.111command to start a server instance bound to IP address
10.10.20.111. You can assign multiple IP addresses to your server and then start a Java EE server instance on each of those IP address. Each server instance listens for the same port numbers on different IP addresses, and hence there is no conflict. This approach provides a good balance between server manageability and performance overhead. We recommend it if your application supports IP address binding.
In the unlikely case when you cannot assign multiple IP addresses to a physical server, an alternative is to reconfigure the port number used by each server instance so that they do not conflict. That would typically require you to go through reams of tedious configuration files, and you must be intimately familiar with those files to look for port numbers. In the JBoss AS, there is a simpler way: JBoss AS has an MBean service called
jboss.system:service=ServiceBindingManagerthat can automatically shift port numbers used in the current server instance (e.g., increase all port numbers by 100 from their default values). In general, we do not recommend messing with port numbers since the application server is not regularly tested with nonstandard port numbers, and a number of complications and side effects could arise from such use.
To achieve optimal results, we should run no more server instances than the number of physical CPUs in the server; otherwise, the server instances would wait on one another to use the CPUs, creating more contention points. We should also keep the memory allocation for each server instance at around 1GB for optimal GC results.
Michael Yuan would like to thank Phillip Thurmond of Red Hat for reviewing this article and providing helpful suggestions.
- Download JBoss AS.
- The Dell Scalable Enterprise Technology Center has a lot of information on how to work with Dell multi-core 64-bit servers.
- JBoss AS Clustering Guide has further information about clustering of JBoss AS.
- Grinder and JMeter are two popular open source web performance-testing frameworks.
Michael Juntao Yuan specializes in lightweight enterprise / web application, and end-to-end mobile application development.
Dave Jaffe is an engineer in Dell's Scalable Enterprise Technology Center.
Return to ONJava.com.