ONJava.com -- The Independent Source for Enterprise Java
oreilly.comSafari Books Online.Conferences.

advertisement

AddThis Social Bookmark Button

JBoss Optimizations 101
Pages: 1, 2

When Loading Data Once May Be Enough ...

The reason for this heavy database usage comes from the cache, or, more accurately, from the absence of cache. For entity beans, the EJB specification defines three commit options, which can be split into two main categories:



  1. I own the database (AKA Commit Option A): If any data must be modified, it will be done through the container. The container is the only point of write access to the database. As such, the container can cache data across transactions without the risk of having an unsynchronized cache.
  2. I don't own the database (AKA Commit Options B or C): Data may be modified by other systems as the EJB container. Consequently, the EJB container cannot keep data in cache across transactions, as it may have been modified externally. It must reload the required data from the database for each transaction.

By default, JBoss' entity bean containers are configured not to use the cache (i.e., Commit Option B). As the CMS101 development team hasn't changed the default configuration, the database becomes the bottleneck. For each database request, the page description, content, header, footer, and left and right sides are reloaded!

Note: I have even seen a real-life situation where CMP 1.1 was used and all data that was read during the transaction was written back at the end of the transaction, even though no fields had been changed.

As the database used by CMS101 is only used by their application, they decide to activate caching and switch to Commit Option A:

<jboss>
  <enterprise-beans>

    <container-configurations>
    <!-
    We define a new configuration that simply overrides
    the default CMP 2.x configuration defined in 
    conf/standardjboss.xml by changing its commit
    option 
    -->
      <container-configuration extends=
        "Standard CMP 2.x EntityBean">
        <container-name>CMP 2.x and Cache</container-name>
        <commit-option>A</commit-option>
      </container-configuration>
    </container-configurations>
  
    <entity>
      <ejb-name>WebPage</ejb-name>
      <configuration-name
      > CMP 2.x and Cache</configuration-name>
      <method-attributes>
        <method>
          <method-name>get*</method-name>
          <read-only>true</read-only>
        </method>
      <method-attributes>
    </entity>

    <entity>
      <ejb-name>PageContent</ejb-name>
      <configuration-name
      > CMP 2.x and Cache</configuration-name>
      <method-attributes>
        <method>
          <method-name>get*</method-name>
          <read-only>true</read-only>
        </method>
      <method-attributes>
    </entity>

    <!-- and so on for Header, Footer, 
    LeftSide and RightSide -->

    <session>
      <ejb-name>PageRenderer</ejb-name>
      <jndi-name>PageRenderer</jndi-name>
    </session>

  </enterprise-beans>
</jboss>

The development team runs the test suite again and sees that both the scalability and level of database usage are excellent! After a little more testing, they will be ready to go into production and sell their (highly value-added) CMS101!

Cluster, You Said Cluster? Oops, Forgot that Detail ...

After months of prospecting, the CMS101 commercial team finds its first customer. There is, however, a small discrepancy between what the sales force has sold and what the development team has implemented (which is quite an unusual situation): the customer expects a high number of requests on its web site and thus wants CMS101 to run in a cluster to balance the load.

Clustering CMS101 is not a problem in itself, as JBoss supports clustering features. The problem is that by doing so, they will lose the performance optimizations they just implemented through Commit Option A. By running a cluster of JBoss instances, more than one JBoss node will access the same database. Furthermore, they will not only read data, but may also update web page content, for example. Consequently, we now have as many points of write access to the database as we have JBoss instances in the cluster. If a user modifies a web page on a specific JBoss node, the database and the local cache will be updated. However, the other JBoss instances will never reload fresh data from the database, instead using their own caches, now containing stale data.

unsynchronized cache data
Figure 5. Unsynchronized cache data

Once again, let's analyze the specific requirements of this application. In the clustered case, our problem is that the data is never refreshed in the other nodes' caches. Consequently, we need a way to force other nodes' caches to reload a specific bean from the database when it is modified on another node. The node that modifies data must send some kind of invalidation message to the other node caches. Luckily, the cache invalidation message doesn't need to be sent transactionally to the other caches — we're dealing with web pages, not bank accounts.

For these scenarios, JBoss incorporates a handy tool: the cache invalidation framework. It provides automatic invalidation of cache entries in a single node or across a cluster of JBoss instances. As soon as an entity bean is modified on a node, an invalidation message is automatically sent to all related containers in the cluster and the related entry is removed from the cache. The next time the data is required by a node, it will not be found in cache, and will be reloaded from the database:

cache invalidation framework
Figure 6. Cache invalidation framework

To activate this behavior in JBoss, the development team has to run JBoss clustered and modify the jboss.xml deployment descriptor:

<jboss>
  <enterprise-beans>

    <entity>
      <ejb-name>WebPage</ejb-name>
      <configuration-name
        >Standard CMP 2.x with cache invalidation<
      /configuration-name>
      <method-attributes>
        <method>
          <method-name>get*</method-name>
          <read-only>true</read-only>
        </method>
      <method-attributes>
      <cache-invalidation>True</cache-invalidation>
    </entity>
      
    <entity>
      <ejb-name>PageContent</ejb-name>
      <configuration-name
        >Standard CMP 2.x with cache invalidation<
      /configuration-name>
      <method-attributes>
        <method>
          <method-name>get*</method-name>
          <read-only>true</read-only>
        </method>
      <method-attributes>
      <cache-invalidation>True</cache-invalidation>
    </entity>
      
      <!-- and so on for Header, Footer, 
        LeftSide and RightSide -->
      
    <session>
      <ejb-name>PageRenderer</ejb-name>
      <jndi-name>PageRenderer</jndi-name>
    </session>

  </enterprise-beans>
</jboss>

Note that we have removed our customized container configuration, instead using the one named "Standard CMP 2.x with cache invalidation," pre-defined in conf/standardjboss.xml. No additional configuration is required to get this behavior. Many other fancy designs can be built using this framework.

Note: JBoss 4.0 will not only contain the distributed invalidation framework, but will also include a full-fledged transactional distributed cache.

This way, the development team can keep all of the advantages from previous optimizations and get even better throughput, thanks to the cluster, all the while satisfying the customer requirements.

Conclusion

The basic optimizations described in this article do not just apply to CMS101, but to any kind of J2EE application with similar data taxonomy. Remember that this analysis can be done on a per-EJB basis, not just for your entire application as a monolithic whole.

Long life to CMS101 and see you on the JBoss.org forums!

Sacha Labourey is one of the core developers of JBoss Clustering and the General Manager of JBoss Group Europe.

Juha Lindfors is a computer scientist at the University of Helsinki.


Return to ONJava.com.