oreilly.comSafari Books Online.Conferences.
Articles Radar Books  

Remaking the Peer-to-Peer Meme
Pages: 1, 2, 3, 4, 5

Usenet, E-mail and IP Routing

Not coincidentally, this evolution from a pure peer-to-peer network to one in which peer-to-peer and centralized architectures overlap echoes the evolution of Usenet. This history also shows that peer-to-peer and client/server (which can also be called decentralization and centralization) are not mutually exclusive.

Usenet was originally carried over the peer-to-peer dialup UUCPnet. Sites agreed to call one another, and passed mail and news from site to site in a store-and-forward network. Over time, though, it became clear that some sites were better connected than others; they came to form a kind of de-facto "Usenet backbone." One of the chief sites, seismo, a computer at the U.S. Geological Society, was run by Rick Adams. By 1987, the load on seismo had become so great that Rick formed a separate company, called UUnet, to provide connectivity services for a monthly fee.

As the uucpnet was replaced by the newly commercialized Internet, UUnet added TCP/IP services and became the first commercial Internet service provider. Interestingly enough, the IP routing infrastructure of the Internet is still peer-to-peer. Internet routers act as peers in finding the best route from one point on the net to another. Yet overlaid on this architecture are several layers of hierarchy. Users get their Internet connectivity from ISPs, who may in turn connect to each other in hierarchies that are hidden from the end user. Yet beneath the surface, each of those ISPs depends on the same peer-to-peer architecture.

Similarly, e-mail is routed by a network of peered mail servers, and it appears peer-to-peer from the user point of view, yet those users are in fact aggregated into clusters by the servers that route their mail, and the organizations that operate those servers.

Centralization and decentralization are never so clearly separable as anyone fixated on buzzwords might like.

Distributed Computation

Distributed computation programs like SETI@Home (the screen saver from the Space Sciences Lab at UC Berkeley that uses the "spare cycles" from more than 1 million PCs to process radio telescope data in search of signs of extra terrestrial intelligence) are not "peer-to-peer" at all in one sense. After all, they use an old style asymmetric client-server architecture, in which the million independent computational clients download their data sets and upload their computed results to the central repository at the Space Sciences lab. The clients don't peer with each other in any way.

But look a little deeper, and something else emerges: the clients are active participants, not just passive "browsers." What's more, the project uses the massive redundancy of computing resources to work around problems such as reliability and network availability of any one resource. But even more importantly, if you look further down the development timeline, when startups such as United Devices, Popular Power, Parabon and others have their services in the market, the "ecology" of distributed computation is going to be much more complex. There will be thousands (and ultimately, perhaps millions) of compute-intensive tasks looking for spare cycles. At what point does it make sense to have an architecture that allows a two-way flow of tasks and compute cycles?

Further, many of the key principles of a Napster are also at play in distributed computation. Both Napster and SETI@Home need to create and manage metadata about a large community of distributed participants. Both need to make it incredibly simple to participate.

Finally, both Napster and SETI@Home have tried to exploit what Clay Shirky memorably called "the dark matter of the Internet" -- the hundreds of millions of interconnected PCs that have hitherto been largely passive participants in the network.

Already, startups like MojoNation are making a link between file sharing and distributed computation. In the end, both distributed file sharing and distributed computation are aspects of a new world in which Sun's long-term slogan, "The Network is the Computer," is finally coming true.

Instant Messaging

Napster could be characterized as a "brokered peer-to-peer system," in which a central addressing authority connects end points, and then gets out of the way.

Once you realize this, it becomes clear just how similar the Napster model is to instant messaging. In each case, a central authority manages an addressing system and a "namespace," and uses it to connect end users. In some ways, Napster can be thought of as an instant-messaging system in which the question isn't "Are you online and do you want to chat?" but "Are you online and do you have this song?"

Not surprisingly, a project like AIMster makes explicit use of this insight to build a file sharing network that uses the AOL Instant Messenger (AIM) protocol. This brings IM features such as buddy lists into the file-sharing arena.

The open-source Jabber instant-messaging platform takes things even further. While Jabber started out as a switching system between incompatible Instant Messaging protocols, it is evolving into a general XML routing system, and the basis for applications that allow users and their computers to ask even more interesting questions of each other.

Ray Ozzie's Groove Networks is an even more mature expression of the same insight. It provides a kind of groupware dial tone, or "LAN on demand" for ad-hoc groups of peers. Like Jabber, it provides an xml routing infrastructure that allows for the formation of ad-hoc peer groups that can share not only files and chat, but a wide variety of applications. Replication, security, and so on are taken care of automatically by the underlying Groove system. If systems like Groove deliver what they promise, we can see peer-to-peer as a solution to the IT bottleneck, allowing users to interact more directly with each other in networks that can span organizational boundaries.

The Writeable Web

The Web began as a participatory groupware system. It was originally designed by Tim Berners-Lee as a way for high energy physicists to share their research data and conclusions. Only later was it recast into a publishing medium, in which sites produce content that will attract millions of passive consumers. To this day, there is a strong peer-to-peer element at the very heart of the Web architecture: the hyperlink.

A Web hyperlink can point to any other site on the network, without any central intervention, and without the permission of the site being pointed to. What's more, hyperlinks can point to a variety of resources, not just Web pages. Part of the Web's explosive growth versus other early Internet information services was that the Web browser became a kind of universal client, able to link to any kind of Internet resource. Initially, these resources were competing services such as ftp, gopher and wais, but eventually, through CGI, it became an interface to virtually any information resource that anyone wanted to make available. Mailto and news links even provide gateways to mail and Usenet.

There's still a fundamental flaw in the Web as it has been deployed, though. Berners-Lee created both a Web server and a Web browser, but he didn't join them at the hip the way Napster did. And as the Buddhist Dhammapadda says, "If the gap between heaven and earth is as wide as a barleycorn, it is as wide as all heaven and earth." Before long, the asymmetry between clients and servers had grown wide enough to drive a truck through.

Browsers had been made freely available to anyone who wanted to download one, but servers were seen as a high-priced revenue opportunity, and were far less widely deployed. There were free UNIX servers available (including the NCSA server, which eventually morphed into Apache), but by 1995, 95 percent of Web users were on Windows, and there was no Web server at all available to them! In 1995, in an attempt to turn the tide, O'Reilly introduced Website, the first Web server for Windows, with the slogan "Everyone who has a Web browser ought to have a Web server." However, by then, the market was fixated on the idea of the Web server as a centralized publishing tool. Microsoft eventually offered PWS, or Personal Web Server, bundled with Windows, but it was clearly a low-powered second-class offering.

Perhaps even more importantly, as Clay Shirky has pointed out, the rise of dynamic IP addressing made it increasingly difficult for individuals to publish to the Web from their desktops. As a result, the original "two-way Web" became something closer to television, a medium in which most of the participants are consumers, and only a relatively small number are producers.

Web site hosting services and participatory sites like Geocities made it somewhat easier to participate, but these services were outside the mainstream of Web development, with a consumer positioning and non-standard tools.

Recently, there's been a new emphasis on the "writeable Web," with projects like Dave Winer's editthispage.com, Dan Bricklin's trellix.com, and Pyra's blogger.com, making it easy for anyone to host their own site and discussion area. Wiki is an even more extreme project, creating Web sites that are writeable by anyone with an area set aside for public comment on a given topic. Wiki has actually been around for six or seven years, but has suddenly started to catch on.

The writeable Web is only one way that the Web is recapturing its peer-to-peer roots. Content syndication with RSS (Rich Site Summary) and Web services built with protocols like xml-rpc and SOAP allow sites to reference each other more fully than is possible with a hyperlink alone.

Web Services and Content Syndication

I asked above, "At what point does it make sense to have an architecture that allows a two-way flow of tasks and compute cycles?" Isn't that a pretty good description of SOAP and other Web services architectures?

What SOAP does is formalize something that has been done for years by sophisticated programmers. It's relatively easy, using perl and a library like libwww-perl, to build interfaces to Web sites that do "screen scraping" and then reformulate and reuse the data in ways that the original Web developers didn't intend. It was even possible, as Jon Udell demonstrated, to take data from one Web site, and pass it to another for further processing, in a Web equivalent to the UNIX pipeline.

SOAP makes this process more explicit, turning Web sites into peers in providing more complex services to their users. The next generation of Web applications won't consist of single-point conversations between a single server and a single browser, but a multipoint conversation between cooperating programs.

One of the key issues that comes up once you start thinking about more complex interactions between sites on the Net, is that metadata management is critical. UDDI is a first step toward a standard for cataloging Web services in ways that will allow them to be discovered by sites that wish to use services provided by each other.

Similarly, content syndication formats such as RSS allow Web sites to cooperate in delivering content. By publishing RSS feeds, sites enable other sites to automatically pick up data about their stories. A site like the O'Reilly Network homepage is updated automatically out of a set of RSS news feeds from a Web of cooperating sites.

Right now, RSS provides only the simplest of metadata about Web pages, for simple syndication applications like creating news digest pages. But the new RSS 1.0 proposal will allow for more complex applications based on distributed data.

Pages: 1, 2, 3, 4, 5

Next Pagearrow

P2P Weblogs

Richard Koman Richard Koman's Weblog
Supreme Court Decides Unanimously Against Grokster
Updating as we go. Supremes have ruled 9-0 in favor of the studios in MGM v Grokster. But does the decision have wider import? Is it a death knell for tech? It's starting to look like the answer is no. (Jun 27, 2005)

> More from O'Reilly Developer Weblogs

More Weblogs
FolderShare remote computer search: better privacy than Google Desktop? [Sid Steward]

Data Condoms: Solutions for Private, Remote Search Indexes [Sid Steward]

Behold! Google the darknet/p2p search engine! [Sid Steward]

Open Source & The Fallacy Of Composition [Spencer Critchley]