ONJava.com    
 Published on ONJava.com (http://www.onjava.com/)
 See this if you're having trouble printing code examples


SwarmStream: A Next-Generation HTTP Stack for Java

by Ry4an Brase, Chad Tippin
03/16/2005

Since even before its initial release, Java was promoted as programming language designed with the internet in mind, and its standard API has included HTTP networking support from day one. However, the rudimentary nature of Java's HTTP APIs have caused in applications no end of problems related to reliability and fault tolerance. SwarmStream Public Edition (SSPE) is a free and freely redistributable tool that drastically improves the performance, reliability, fault-tolerance, and even the feature set of Java's built-in HTTP networking routines without any code-level, or even compile-time, changes required.

Chief among the sins of Java's built-in HTTP support is its lack of support for automatic retries. Transient errors are a part of computer networking, and requiring developers to handle retries manually has resulted in scores of applications that turn temporary outages or general network hiccups into fatal errors. Just as the built-in HTTP protocol handler will automatically follow a 301 or 307 redirection response code, it is reasonable to expect it to retry a connection when appropriate.

Recognizing the shortcomings of java.net HTTP support, many third-party HTTP implementations have been developed. Each alternate HTTP library fills in various gaps and adds new features. Foremost among these replacements is the Apache Jakarta Project's HttpClient component. It provides enhanced HTTP functionality, but does not offer automatic retries. Furthermore, like all replacement HTTP components to date, it offers a different API and requires development-time code modifications.

O'Reilly Emerging Technology Conference.

SwarmStream Public Edition augments Java's built-in HTTP networking implementation with support for automatic retries while providing a great many additional features, such as download acceleration, resume support, and disconnected operation. And it does so without requiring any code or compile-time changes. It does this by taking advantage of Java's pluggable protocol handler support. Pluggable protocol handlers are an infrequently used feature of the Java networking API. They allow alternate connection establishment and control code to be substituted in lieu of the default implementations for standard protocols. By setting a simple system property, the SwarmStream Public Edition HTTP implementation can be used to handle all URLConnection operations for java.net.URL objects whose scheme is http:. An excellent explanation of Java's pluggable protocol handlers can be found at java.sun.com/developer/onlineTraining/protocolhandlers.

To use SwarmStream Public Edition's enhanced HTTP implementation in a Java application, one simply sets the system property java.protocol.handler.pkgs to a value of com.onionnetworks.sspe.protocol. This setting is a run-time change and requires only that the sspe.jar file from the SwarmStream Public Edition distribution be available in the classpath. From the command line, one could simply invoke the Java runtime using a command line such as:


java -Djava.protocol.handler.pkgs=com.onionnetworks.sspe.protocol your.class.to.Run

When SwarmStream Public Edition is loaded as the HTTP protocol handler, a great many new features are enabled automatically. First and foremost, retries start working. Upon the establishment of an HttpURLConnection, any initial failure will result in a configurable number of retries before an IOException is thrown and the connection attempt is considered failed.

In addition to retry support, SSPE offers a great many other seamless benefits. All HTTP downloads above a configurable size are automatically upgraded to multi-connection downloads. Normal TCP connections, as used by HTTP, encounter a maximum connection throughput well below that of the available bandwidth in circumstances with even moderate amounts of packet loss and signal latency. Multiple TCP connections can help to alleviate these effects and, in doing so, provide faster downloads and better utilization of available bandwidth. SSPE uses up to three simultaneous downloads to provide downloads that are up to three times faster.

Furthermore, SSPE inserts a seamless caching layer below the HTTP networking interface. This layer saves previously downloaded data into a persistent disk cache of configurable size. Cache data is retained and expired using the HTTP caching rules as described in RFC 2616. Having this data cached allows for nearly instant repeat downloads, which are very common for applications with networked configuration and resource files. The cache also supports partial file caching, enabling it to effectively resume aborted or failed downloads without re-fetching previously downloaded bytes. This behavior is totally transparent and requires no intervention from the client code. Furthermore, the HTTP caching semantics allow for the use of cache data in the event of network outages, which provides fully featured disconnected operation capabilities through the normal Java HTTP networking API.

Configurable Network Host Fail-Over

By adding just a few more system properties to the mix, SSPE is able to provide intelligent network fail-over. When the primary network source of a file is unavailable, connections will fall back to an alternate mirror source. SwarmStream Public Edition identifies mirror sources for URLs using an implementation of the com.onionnetworks.sspe.MirrorFactory interface.

The default MirrorFactory implementation, com.onionnetworks.sspe.SimpleMirrorFactory, performs simple character substitution on URLs to obtain corresponding mirror URLs. The substitution is created using the system properties onion.sspe.mirror.before and onion.sspe.mirror.after. The first instance of the value of the former is replaced by the value of the latter. For example, if the properties were set thusly:


    onion.sspe.mirror.before=www.mycompany.com
    onion.sspe.mirror.after=www2.mycompany.com

then the URL http://www.mycompany.com/path/dir.mpg would be simultaneously downloaded from (and in the event of primary failure, exclusively from) http://www2.mycompany.com/path/dir.mpg.

With a more complex mapping of source URLs to mirror URLs, one can write a custom MirrorFactory implementation. The interface has only two methods:


    public void init(Properties props);
    public List mirrorsFor(String url);

The first provides the java.util.Properties object from which any configuration details should be taken. The second produces a java.util.List of String URLs corresponding to the provided input URL. Only the first element in the list is consulted in the Public Edition. When a custom implementation is to be used, the full class name need only be set as the value of the onion.sspe.mirrorfactory property. A simple implementation that always uses the host alpha.mycompany.com as a mirror for bravo.mycompany.com (and vice versa) could look like:


    public class CustomMirrorFactory implements MirrorFactory {

        private final static String ALPHA = "//alpha.mycompany.com";
        private final static String BRAVO = "//bravo.mycompany.com";

        public void init(Properties props) { }

        public List mirrorsFor(String url) {
            List list = new ArrayList(1);
            String mirror = null;
            int pos;
            if ((pos = url.indexOf(ALPHA)) != -1) {
                mirror = url.substring(0,pos) + BRAVO
                    + url.substring(pos+ALPHA.length());
            } else if ((pos = url.indexOf(BRAVO)) != -1) {
                mirror = url.substring(0,pos) + ALPHA
                    + url.substring(pos+BRAVO.length());
            }
            if (mirror != null) {
                list.add(mirror);
            }
            return list;
        }
    }

Whether you're using SwarmStream Public Edition with a custom MirrorFactory implementation to achieve complex fail-over setups or just using it without any custom configuration at all for the transfer speed-ups, retries, caching, and disconnected operation, it's a low-risk, zero-development-time tool that's worth adding to your toolbox. Since it's free to use and openly redistributable, it can be added to most any new or existing project for immediate gains. Download it, try it out, and see how it works with the way you use HTTP in your application.

Ry4an Brase is director of engineering at Onion Networks, a content delivery company headquartered in Minneapolis, Minnesota.

Chad Tippin is the SwarmStream product manager for Onion Networks, a content delivery company headquartered in Minneapolis, Minnesota.


Return to ONJava.com.

Copyright © 2009 O'Reilly Media, Inc.