Gnutella, an open peer-to-peer search system primarily used for file sharing, was released in March. Within four months, developer activity had substantially diminished, although usage continued to surge due to Napster-driven media attention on peer-to-peer file-sharing systems. After five months, the strain of an increasing number of users on a weak technical infrastructure resulted in a quasi-collapse of the Gnutella network. Late in the year, however, a second wave of more sophisticated development began to emerge, informed by experience. Defying reports of its demise, Gnutella is evolving and usage is growing in response, although significant technical challenges remain.
What problems have been overcome and how? What problems remain to be solved, and how can they be addressed? Clip2's Distributed Search Solutions initiative has continuously gathered data on the Gnutella network and closely followed related application development. Here, we cover some representative issues to provide insight into Gnutella's evolution.
The origins and technical significance of Gnutella have been described elsewhere. Some notable points:
It is not hard to imagine from the foregoing that Gnutella is susceptible to a number of problems.
Non-compliant implementations are problematic not just for their users, who may not be able to effectively communicate with others, but they are also trouble for the network at large. Because Gnutella messages are relayed from host to host, the impact of a non-compliant application can easily extend beyond its installed base and be magnified out of proportion.
But, what does "non-compliant" mean for a protocol without a blessed standard? In the open world of Gnutella, free from central authority, compliance means being able to effectively communicate with the bulk of the installed base. It is not unlike the situation with languages such as English that have no formal codification. Protocol specification documents in this environment then become analogous to dictionaries that reflect popular usage instead of dictating usage.
Of course, non-compliance can arise out of the purposeful invention of new words or simply out of poor grammar and pronunciation. Among the many ways an application can go wrong on the latter front: It can malform messages it originates, it can corrupt messages it forwards and it can improperly route messages. Proper handling of the routed message types by creating and maintaining a routing table is a feature that, when short-shrifted by a developer, results in substantial costs to users, including increased traffic and lost responses. The low barriers to entry to Gnutella programming have encouraged less experienced developers to try their hands, often exacerbating matters.
Non-compliant implementations have been kept in check by, among other things, the availability of quality protocol specification documents, and the strict filtering implemented in popular applications in order to not propagate deviant messages. They represent a continued problem.
Connectivity was a big headache for users. Just as a Web browser needs a start page, a Gnutella application needs a start host. Unfortunately, early programs did not come preset with one because host addresses generally have short shelf lives. This sent users searching across Web sites, message boards and chat rooms for active host addresses. Developer Bob Schmidt came to the rescue with gnuCache, an open-source application that automatically began doling out addresses from several enthusiast-run servers. Not long after, Clip2 began reliably serving lists of well connected, verified active hosts through a service that could be accessed at Gnutellahosts.com via Gnutella and the Web. By fall, developers had begun providing an "auto-connect" feature in Gnutella applications that relied upon host list services for start hosts, relieving users of the need to bother with this matter. Technically, these services are sufficiently uniform in the way they operate so as to be interchangeable to the developer, although the quality of addresses returned varies. The connectivity problem has thus been addressed in a manner that is not susceptible to a single point of failure.
A lack of search results was a substantial issue following the quasi-collapse of the network in August. As the traffic carried by an average host grew, it eventually exceeded the capacity of hosts on the slowest physical links -- dial-up modems. These hosts became bottlenecks in the network, effectively severing communication lines running through them. Fragmentation into smaller sub-networks effectively resulted, with the upshot that users saw fewer search results.
Responses to the issue followed a common theme: move users on slower connections to the edge of the network.
In October, Clip2 introduced the Reflector, a special Gnutella server designed to run on a high-speed connection and act as a proxy for users on slower links. In so doing it conserves the user's bandwidth and situates slower hosts at the edge of the network. Via a Reflector, a network of users can use Gnutella with far less aggregate bandwidth than would otherwise be required. Most Reflectors are run on behalf of a particular user population and not publicly advertised, although a handful of public-access ones are available at any given time.
November and December saw the introduction of two significant new Gnutella applications. First, Lime Wire LLC introduced LimeWire, then Free Peers, Inc. released BearShare. Both programs apply connection-preferencing rules that decide whether a given connection will be maintained. One common example: connections to unresponsive hosts are dropped. The consistent repeated application of this simple rule to a series of connections will tend to drive slower hosts to have fewer connections and sit at the edge of the network, a bit like a poor conversationalist might find himself marginalized at a party.
Coincident with these developments and the uptake in adoption of these applications, Clip2 has seen a steady increase in the number of responsive hosts active at any given time on the network, rising from a typical figure of 500 in October to more than 1500 in early January 2001. The quantity of search results has increased as well. According to Clip2 estimates, the number of Gnutella users per day has risen from 10,000 to 30,000 in November to between 20,000 and 50,000 in January.
Download failure looms as one of the most serious problems according to many. Although attractive search results may come back, they are useless if the associated files cannot be downloaded. Quantitative study of the problem is complicated since users have preferences in the files they download and upload. Since all files are not equal, there is much room for inaccuracy in the results of any test that assumes otherwise. Nonetheless, there is a preponderance of perception that downloads fail too often, particularly relative to other peer-to-peer file-sharing systems.
Spurred by an August 2000 paper by Eytan Adar and Bernardo Huberman of Xerox PARC, there is belief that "freeloading" - users downloading much more than they upload - is a major source of the download failure problem, although the critical ratio of supply and demand is anyone's guess. The response to commentator Clay Shirky's counterpoint that "bandwidth over time is infinite" is that the server bandwidth available to users who want to download a file right now is too finite.
Developers are taking two major actions:
"Busy signals" are not the only possible cause for download failure. Hosts may be unreachable due to firewalls or intervening network address translation devices, applications may be buggy or incompatible, hosts may go offline or change their content between the point of advertising a file and the point of receiving a download request, and so on. A mechanism that enabled hosts to verify each other's ability to upload any file would address some of these issues.
"What next?" is a fitting conclusion, for it is a problem that looms over Gnutella's future. Non-compliant implementations, connectivity, a lack of search results, download failure - these are all nuts-and-bolts problems with Gnutella. Sorting them out is necessary for Gnutella to meet commonly held basic expectations of it as a usable, public, decentralized file-sharing system. What happens when these core issues are sufficiently resolved?
The answer is that users spur developers to push on to new features. But which features? The trouble of "What's next?" is the contentious issue of agreeing on what problems need to be solved. Some aspire to see Gnutella be more scalable or more secure. Some want the system to be more anonymous, some want it to be less. Some hope it becomes a more generalized distributed search medium and grow beyond its file-sharing origins. Some imagine other applications riding upon it, even commerce. It seems there is no end to the expectations.
Unfortunately, Gnutella has a history of aborted, failed or poorly supported attempts to unite developers; the analogy of herding cats has rarely been so apt. One of the most notable efforts -- Gnutella Next Generation -- never significantly advanced beyond the proposal stage. Media reports have confused a spin-off effort known as gPulp as a Gnutella organization, but as the principal behind it has recently stated, "We are not a working group on Gnutella."
As of this writing, then, there is no clear leader in terms of a working group or other form of organization. There is, however, one arbiter of innovations: the market. Gnutella developers who have experimented with "improvements" that run counter to, outsid, or in between the lines of the de facto protocol have been kept in check by the fact that their applications must be able to communicate with those produced by other developers.
Will this market-driven pattern continue, so that Gnutella evolves in a competitive, Darwinian, decentralized and bottom-up manner? Or will it "grow up" and follow the trajectory of many other protocols, evolving through top-down committee processes? Only time will tell.
Kelly Truelove is an independent research analyst who, via Truelove Research, covers peer-to-peer technology with a focus on P2P content search, storage, and distribution networks. He is regarded as a leading expert on consumer file-sharing systems, which he covers with a data-driven approach.
Discuss this article in the O'Reilly Network General Forum.
Return to the P2P DevCenter.
Copyright © 2009 O'Reilly Media, Inc.