ONJava.com -- The Independent Source for Enterprise Java
oreilly.comSafari Books Online.Conferences.


AddThis Social Bookmark Button

Enhance Collection Performance with this Treasure Trove

by Dion Almaer

Ever use the Collections API? How can you say no?! The Collections framework is probably one of the best API sets available to a Java programmer. Since many parts of our applications use a HashMap, ArrayList, or LinkedList at some point, enhancing the performance of these guys can do a lot for us.

Eric D. Friedman wrote a high performance set of collections called Trove. Trove allows you to plug in their versions of certain containers (HashMap, HashSet, LinkedList), and use them just like you did with the standard versions. There are also ways to utilize primitive collections to gain even more performance. Don't you love open source?

In this article, we will discuss:

  • Using Trove classes
  • Using a Factory to allow you to switch between Trove and JDK collections
  • Trove's Primitive collections

Using Trove Classes

Trove is ridiculously simple to work with. You can simply download the code and set up your CLASSPATH to see the lib/trove.jar file.

To use the Trove classes, you need to do the following:

  1. Import the Trove classes:

    import gnu.trove.*;
  2. Change collection creation:

    // Map map = new HashMap(); // standard hashmap
    Map map = new THashMap();   // trove version of hashmap
  3. You can see the leading T in the Trove classes. This is there to prevent name collisions, and means you don't have to fully qualify all collections (e.g. java.util.HashMap vs. gnu.trove.HashMap).

How is Trove Faster?

Related Reading

Learning Java
By Patrick Niemeyer, Jonathan Knudsen

The THashMap class is a fair amount faster than java.util.HashMap. How is this so? The standard HashMap holds a Map.Entry object for each of its elements. Trove uses open addressing, and hence gets around this problem with chaining. You can still use Map.entrySet on a Trove Map, but it is a Bad Thing. This will cause Trove to create the Map.Entry objects that it tries to get away from, so if possible, don't use these operations. If you buy into Trove, you can go down a path that couples you tightly to the Trove collections. You can implement Trove specific APIs (forEachEntry, forEachKey, forEachValue, and transformValues) to potentially gain higher performance.

Hash tables have a size, and have to grow when the data set exceeds its current capacity. It has been shown that if the size is a prime number, the performance of the hash table increases. Trove makes sure that its capacity is always a prime number, which is something that the JDK doesn't do.

When I run a benchmark on my system, I get the following results for putting values into a hash table:

Comparing 100000 Map.put() operations
JDK   average (msec): 321
Trove average (msec): 93

Difference: Trove is 3.45 times faster!

How is Trove Smaller?

Another nice side effect of not containing the Map.Entry objects is that the resulting data structures take up a lot less space. This is particularly noticeable with the Set implementations, as they are not backed by Maps like the JDK.

Comparing size of Set implementation: 1,000 Integer's

JDK   size: 46216 bytes
Trove size: 28856 bytes

Trove's collection requires 62% of the memory needed by the JDK collection.

Pages: 1, 2

Next Pagearrow