Java's class loading mechanism is incredibly powerful. It allows you to leverage external third-party components without the need for header files or static linking. You simply drop the JAR files for the components into a directory and arrange for them to be added to your classpath. Run-time references are all resolved dynamically. But what happens when these third-party components have their own dependencies? Generally, it is left up to each developer to determine the full set of required components, acquire the correct version of each, and ensure that they are all added to the classpath properly.
But it doesn't have to be like this; Java's class loading
mechanism allows for more elegant solutions to this problem. One
such solution is for each component's authors to specify the
dependencies of their component inside of its JAR manifest. A
manifest is a text file (META-INF/MANIFEST.MF) that
can be included inside of a JAR to specify metadata about the file.
The most popular attribute, Main-Class, specifies a
main class that java -jar can use to locate which
class to invoke. However, there is a related, but much less
well-known, attribute called Class-Path that lets a
JAR specify that it has dependencies on other JARs. Java's default
ClassLoader knows to check for these attributes and to
automatically append the specified dependencies to its internal
classpath.
Let's look at an example. Consider a Java application that implements a traffic simulation. This application is composed of three individual JARs:
The naive way to execute this application is:
$ java -classpath
simulator-ui.jar:simulator.jar:rule-engine.jar
com.oreilly.simulator.ui.Main
Editor's note: the above command should be entered on one line; it has been wrapped to fit the constraints of our web layout.
But we could also specify this information in JAR manifest
files. simulator-ui's MANIFEST.MF file looks like
this:
Main-Class: com.oreilly.simulator.ui.Main
Class-Path: simulator.jar
While simulator's MANIFEST.MF simply contains:
Class-Path: rule-engine.jar
rule-engine either does not have a manifest, or it is empty.
Now we can just do:
$ java -jar simulator-ui.jar
Java will automatically parse the manifest entries to extract
the main class and modify the classpath accordingly. It will even
determine the path of simulator-ui.jar and interpret
all Class-Path attributes relative to this path, so we
could just as easily have done one of the following:
$ java -jar ../simulator-ui.jar
$ java -jar /home/don/build/simulator-ui.jar
Java's implementation of the Class-Path attribute
presents a big improvement over specifying the entire classpath
manually. However, both approaches have some important limitations.
One of the biggest limitations, which may not have even crossed
your mind, is that you can only load one version of each component.
This may seem obvious because most programming environments have
this limitation. However, it is not uncommon for large multi-JAR
projects with many third-party dependencies to encounter conflicts
in those dependencies.
For example, let's say that you're developing a meta-search engine that queries multiple search engines and collates the results. Google and Amazon's Alexa both support web services APIs that use SOAP as a communication mechanism, and both provide Java libraries that can be used to conveniently access these APIs. This is a bit contrived, but for the sake of argument, let's assume that your JAR, metasearch.jar, depends upon google.jar and amazon.jar, each of which depend upon a common soap.jar.
This is fine for now, but what happens in the future when the SOAP protocol or API changes in some way? It's quite likely that these two search engines will not choose to upgrade at exactly the same time. There may come a day when accessing Amazon requires SOAP v1.x and accessing Google requires SOAP v2.x, and the two versions of SOAP were not designed to co-exist in the same process. In this case, we might have the following JAR dependencies specified:
$ cat metasearch/META-INF/MANIFEST.MF
Main-Class: com.onjava.metasearch.Main
Class-Path: google.jar amazon.jar
$ cat amazon/META-INF/MANIFEST.MF
Class-Path: soap-v1.jar
$ cat google/META-INF/MANIFEST.MF
Class-Path: soap-v2.jar
This captures the dependencies correctly, but there's no magic here--this won't do what we want. If soap-v1.jar and soap-v2.jar define many of the same classes, we're almost certainly going to have problems.
$ java -jar metasearch.jar
SOAP v1: remotely invoking searchAmazon
SOAP v1: remotely invoking searchGoogle
As you can see, soap-v1.jar was added to the classpath first, so it is used in both cases. Just as in the previous example, this is equivalent to:
$ java -classpath
metasearch.jar:amazon.jar:google.jar:soap-v1.jar:soap-v2.jar
# WRONG!
Editor's note: the above command should be entered on one line; it has been wrapped to fit the constraints of our web layout.
It's interesting to note that Yahoo has also released a web services API, and they do not seem to have introduced a dependency on an existing SOAP/XML-RPC library. On smaller projects, conflicting component dependencies are often cited as a reason not to use a full-scale component (such as a collections library) when you can get by with a small, hand-rolled solution or with including just the one or two classes needed. Hand-rolled solutions have their place, but it is almost always better to use a real component if one is available. And copying other components' classes into your own codebase is never a good idea; in effect, you've just forked the development of that component and no one is ever going to merge in bug fixes or security updates.
Many larger projects, primarily commercial components, have even adopted the disturbing practice of consuming entire components and bundling them inside of their own JAR. To do this, they mangle the package name to make it unique (e.g., com/acme/foobar/org/freeware/utility) and include the classes directly in their JAR. This has the advantage of preventing any clashes between multiple versions of these component JARs, but at considerable cost. Doing this completely hides the third-party dependencies from the developers. If this process became widespread, it would lead to extreme inefficiencies (both in terms of the size of JAR files and the inefficiency of loading multiple versions of each JAR into one process). The problem with this approach is that if two components depend on the same version of a third component (or can be made to do so), there is no central mediator which can determine this and ensure that the shared component is only loaded once. This is something that we'll be investigating in the next section. In addition to any inefficiencies, it is quite likely that your ability to legally bundle third-party software with your own project may be restricted by the license under which that software is released.
Another approach to this problem is for each component's
developers to encode a version number explicitly in your package
name. Sun's javac code takes this approach--there is a
com.sun.tools.javac.Main class that simply forwards
calls on to com.sun.tools.javac.v8.Main. Each time a
new Java version is released, the package of this code changes.
This allows multiple releases of a component to live in a single
class loader and it makes the choice of version explicit; however,
this is not a very good solution, overall. Either clients need to
know exactly what version they plan to use and must change their
code to switch to a new version, or they must rely on wrapper
classes that forward method calls to the latest version (in which
case, these wrapper classes suffer from the same problems that we
highlighted above).
|
The problem that we're facing here is that in most projects,
there is a single global namespace into which all classes are
loaded. What if, instead, each component had its own namespace and
it could load all of its dependent components into this namespace
without affecting the rest of the process? We can actually do this
in Java! Class names do not need to be unique--only the
combination of class names and their defining
ClassLoader must to be unique. This means that each
ClassLoader acts like a namespace, and that if we can
load each component with its own ClassLoader, it will
have full control over how its dependencies are satisfied. It can
delegate class lookups to another ClassLoader that
contains only the specific version of each of its dependent
components. For example, see Figure 1.

Figure 1. Decentralized class loaders
However, this architecture is not much better than the approach of bundling each dependent JAR with your own. What we need is a central authority that can ensure that each component version is only loaded by a single class loader. The architecture in Figure 2 will ensure that each component version is only loaded once.

Figure 2. Class loaders with mediator
To implement this, we'll need to create two different kinds of
class loaders. Each ComponentClassLoader will extend
Java's URLClassLoader to provide the logic needed to
extract .class files from a single JAR. However, it
will also perform two other tasks. When created, it will retrieve
the JAR manifest and look for a new attribute,
Restricted-Class-Path. Unlike Sun's
Class-Path attribute, this one implies that the
specified JARs should be available only to this component and no
others.
public class ComponentClassLoader extends URLClassLoader {
// ...
public ComponentClassLoader (MasterClassLoader master, File file)
{
// ...
JarFile jar = new JarFile(file);
Manifest man = jar.getManifest();
Attributes attr = man.getMainAttributes();
List l = new ArrayList();
String str = attr.getValue("Restricted-Class-Path");
if (str != null) {
StringTokenizer tok = new StringTokenizer(str);
while (tok.hasMoreTokens()) {
l.add(new File(file.getParentFile(),
tok.nextToken());
}
}
this.dependencies = l;
}
public Class loadClass (String name, boolean resolve)
throws ClassNotFoundException
{
try {
// Try to load the class from our JAR.
return loadClassForComponent(name, resolve);
} catch (ClassNotFoundException ex) {}
// Couldn't find it -- let the master look for it
// in another components.
return master.loadClassForComponent(name,
resolve, dependencies);
}
public Class loadClassForComponent (String name,
boolean resolve)
throws ClassNotFoundException
{
Class c = findLoadedClass(name);
// Even if findLoadedClass returns a real class,
// we might simply be its initiating ClassLoader.
// Only return it if we're actually its defining
// ClassLoader (as determined by Class.getClassLoader).
//
if (c == null || c.getClassLoader() != this) {
c = findClass(name);
if (resolve) {
resolveClass(c);
}
}
return c;
}
}
When a request is made to load a class that does not exist in
the specified JAR, rather than simply forwarding on to the parent
class loader, it will explicitly call the
MasterClassLoader and pass in its list of JAR
dependencies. The MasterClassLoader then forwards the
request on to the ComponentClassLoader for each of the
specified dependencies.
public class MasterClassLoader extends ClassLoader {
// ...
public Class loadClassForComponent (String name,
boolean resolve, List files)
throws ClassNotFoundException
{
try {
return loadClass(name, resolve);
} catch (ClassNotFoundException ex) {}
for (Iterator i = files.iterator(); i.hasNext(); ) {
File f = (File)i.next();
try {
ComponentClassLoader ccl =
getComponentClassLoader(f);
return ccl.loadClassForComponent(name, resolve);
} catch (Exception ex) {
// simplified for clarity
}
}
throw new ClassNotFoundException(name);
}
}
This approach has a number of beneficial properties. The most important is that we can now satisfy that original dependency diagram with no coding changes needed to any of the components (in theory--see the caveats given below). This decreases the coupling of the components, since each can depend on whatever version of the component that it desires, without forcing other components to upgrade or downgrade to match it.
Another advantage of this technique is increased transparency.
Each component's runtime dependencies are listed explicitly, and
they are enforced. Even when using the Class-Path
manifest attribute, you can never be quite sure that you haven't
missed a dependency that is fulfilled accidentally. Consider the
case where your component uses the commons-log component, which in
turn uses log4j to do logging. You may have another
component that depends upon log4j but does not specify it as a
dependency. Because it is already added to the classpath, you
wouldn't detect this, and if it came time to replace log4j with a
competitor, you'd have a problem. Instead, by using
Restricted-Class-Path if you didn't list log4j as a
dependency, you'd get a ClassNotFoundException.
Now that we have a class loader capable of implementing our new versioning policy, we need to have some way to install it. If our code was going to be embedded in an application server, or some other kind of shell, that shell code could create the new class loader programmatically and use it to load our code. This way, a single server process could be used to execute multiple versions of our code, by specifying the desired version in a field of the request. But what if we just want to use this with an ordinary Java application?
An ideal way to do this would be with the
-javaagent command-line argument added in Java 1.5.
This would let us tell Java to initialize a specific JAR (called an
agent) before loading the main class of our application.
Unfortunately, agent classes are loaded by the same class loader
that loads your main class (the system class loader), so it's
already too late to install our custom class loader when our
agent's premain method is executed.
Another approach is to create a "bootstrap" main class that
simply sets up the class loader and uses it to locate our real main
class and invoke its main method. This approach is
very simple, but removes some of the elegance of using Java's
-classpath and -jar options and requires
that we invoke the main method ourselves.
Instead, we will override the
java.system.class.loader system property so that our
class loader is initialized as the system class loader. To do this,
we'll create a third class loader, WrapperClassLoader,
to serve as our replacement for the system class loader. Its parent
will be the bootstrap class loader, that will contain the Java
Runtime Library (rt.jar) as well as our
classloader.jar. When initialized, it will read the
java.library.path system property and create a
ComponentClassLoader for each JAR specified.
public static List initClassLoaders (MasterClassLoader master)
throws MalformedURLException, IOException
{
List loaders = new ArrayList();
String classpath =
System.getProperty("java.class.path");
StringTokenizer tok = new StringTokenizer(classpath,
File.pathSeparator);
while (tok.hasMoreTokens()) {
File file = new File(tok.nextToken());
loaders.add(master.getComponentClassLoader(file));
}
return loaders;
}
We can now run our meta-search engine like this:
$ java -Xbootclasspath/a:classloader.jar \
-Djava.system.class.loader=
com.onjava.classloader.WrapperClassLoader \
-jar metasearch.jar
SOAP v1: remotely invoking searchAmazon
SOAP v2: remotely invoking searchGoogle (with newFlag = true)
In this final version, we actually went a few steps beyond the original requirements. Instead of embedding the version number for the SOAP component in a static field, we're now extracting it from a properties file. This means that resource loading through our class loaders is supported, and must contain logic very similar to actual class loading. We also changed the API a bit in soap-v2.jar, from
public Object invokeMethod (String name, Object[] args)
to:
public Object invokeMethod (String name, Object[] args,
boolean newFlag)
It may seem strange, but this means that if we put the source code
for what we just ran into a single directory, we couldn't compile
it together! If we tried to build both google and
amazon with the same version of soap.jar,
the method signatures of one would not match. If we tried to build
with both versions of soap.jar, we would get duplicate
class errors. However, we can compile google.jar and
amazon.jar separately--without any thought to
whether they are using compatible versions of soap.jar--and then we can run them in separate class loaders within
the same process.
Think about it. If you paired this technique with a build tool such as Maven that manages component dependencies at build time, you might never run into missing dependencies or conflicting JARs again.
Don Schwarz is a Java developer for a large investment bank who specializes in metaprogramming and language integration.
Return to ONJava.com.
Copyright © 2007 O'Reilly Media, Inc.