Articles Weblogs Books School Short Cuts Podcasts  
P2P Profiles

OpenCola: Swarming Folders


OpenCola offers two products that use peer-to-peer techniques to make content management and retrieval easier, both of them open source. I talked to chief evangelist and founder Cory Doctorow by phone and email to obtain some of the details of their architecture.

Swarmcast distributes content across multiple systems and provides sophisticated retrieval. Folders allows people to search for the content they want. Interestingly, Folders preceded Swarmcast. In other words, the developers started by helping people find content on other systems. They had trouble scaling that service and realized they needed another service to distribute and retrieve the content efficiently.

Swarmcast is meant for rapidly propagating large (over 1MB) files that are very popular, such as recently released movies. People can request a file from a Web site in the usual manner, but the request is redirected to Swarmcast. The developer is freed from providing the enormous bandwidth and complicated storage networks traditionally required to serve content to a lot of people. Ultimately, the files spread out in an unplanned but very efficient way through a far-flung network of users in classic peer-to-peer fashion.

Swarmcast stores a file by breaking it into multiple chunks and storing it on a variety of system. So in making the click that requests the file, each user ends up getting a variety of different chunks from different servers.

Also in P2P Profiles:

Tadaaa! It's Thinkstream

Allcast: New Life for Live Content

Jibe: Building distributed databases that standardize product searches

XDegrees tackles name service and file caching

Porivo: Load Testing with P2P

If the requester's bandwidth is large, the system can receive multiple chunks in parallel and thus retrieve the file much more quickly than a traditional download from a single source. Swarmcast monitors throughput on the requester's system, and checks the availability of peers, so it can increase the number of sources to the maximum possible. As an example of the success of the method, Doctorow said that students in dorms at Carnegie Mellon University were getting 500 kilobytes/second throughput, or better, during tests of Swarmcast.

As people using Swarmcast get chunks of a file, the system automatically makes these chunks available to other requesters; this is where Swarmcast exploits the duplication and redundancy that peer-to-peer systems are famous for. However, once somebody has the complete file, his or her system stops serving up content. Swarmcast does not expect systems to be available for serving up data all the time.

To make the storage more robust (because systems hosting content can go offline, as in any P2P system) Swarmcast adds parity information to each chunk. Each chunk gets 1.5 times its original size when parity information is added; however, the requester can regenerate the whole file by combining a subset of the chunks. For instance, if parity turns a 2MB file into a 3MB file, each requester needs to download only 2MB of different chunks to reconstruct the file. Parity thus makes downloading more flexible while guarding against file corruption.

As with Napster or Freenet, people who request files end up storing a copy on their own systems. Thus, popular content multiplies quickly, and people are more and more likely to find available servers nearby. Like Freenet and Gnutella, Swarmcast does a cascading referral: when you're asked for something, you send it back if you have it, and also pass on the request to your peers.

OpenCola works best for content that suddenly becomes popular, like a fast-breaking news story or the clip from a just-released movie. Like many P2P systems, Swarmcast substitutes redundancy for reliability. (That is, if somebody takes down his system, it's OK because other people are likely to host the same content.)

Related Articles:

The Buzz on Swarmcast

What's Up at Uprizer?

The Transient Web

Like Napster, if you visit the system of someone from whom you've downloaded Swarmcast material, you are likely to find more files of interest to you. (This is not true in Freenet because of the anonymity requirement; you don't even know what's on each system there.)

Like XDegrees, Swarmcast can also determine intelligently the best system from which to download a file. For instance, it chooses a system on the same LAN before trying a system that's far away.

Folders: TiVo for the Internet

Folders is a search system -- sort of a "TiVo for the Internet," in the words of Doctorow. If somebody has one thing you like, they probably have other things you like too--and Folders lets you check what they have. The way to find interesting content, he says, is to find interesting people. You then automatically get the content they find interesting.

There is no centralized database as in TiVo. Folders is designed for anything and everything, so there's no hope of being able to control the metadata.

In fact, Folders manages to figure out metadata for content without people having to add it explicitly. When you force individual users to tag their own content, you are plagued by low participation, errors, and inconsistent tagging. Folders just works with anything it can figure out. Content that interests the same group of people tends to accumulate in the same place. One O'Reilly editor who looked at Folders in action said it was a "unique pleasure" to see a directory fill up automatically and gradually with new, related material.

OpenCola's business model consists of providing servers that insert data into Swarmcast systems for their customers and provide access to people behind firewalls who can't run Swarmcast and Folders directly.

Andy Oram is an editor for O'Reilly Media, specializing in Linux and free software books, and a member of Computer Professionals for Social Responsibility. His web site is

Return to