Distributed Computing Economics and the Semantic Web
Jim Gray speak the other night. He was the first speaker in this fall's Distinguished Speaker series at SDForum. I liked the talk a lot. In particular, I very much enjoyed the part of his talk dealing with Distributed Computing Economics.
The argument itself is basic economic analysis, and can be boiled down to the notion that since everything costs money, you should consider the costs of everything when building applications. In particular, Gray focuses on the costs of cpu time (small, and dropping all the time) and the cost of network bandwidth (not so small, and decreasing at a slower rate). By putting actual dollar values on things, Gray is able to draw some startling conclusions about when it makes sense to use grid-computing techniques, and when it makes sense to either use a LAN-based system or a single machine (as opposed to distributing the computation over a WAN, or using "on-demand" computing).
In particular, he says the following: the break-even point is 10,000 instructions per byte of network traffic, or about a minute of computation per MB of network traffic. That is, unless the cpu time at the other end of the pipe is free, and you get a minute of computation for every MB of data you send to it, you're better off doing the computation locally.
There's an interesting reverse to this. If you're running a database on the wire, you'd much rather someone ask you to do a computation than ask you to send a large amount of data in response to a query (the economics apply when you're sending an answer as well).
Two things struck me while Gray was speaking. The first is that the analysis isn't very different from that in Gray's classic papers on the five minute rule. But despite the fact that a Turing Award winner repeatedly uses this style of argument, I don't see it being applied very often in other areas.
The second is that I think it very much applies to the semantic web. If you'll recall, the idea of the semantic web is to create a giant distributed knowledge-base, with lots of information encoded in RDF triples so that the machines, as well as the humans, can process the data.
Now along comes Gray, making an argument that, when you think about it, implies that the semantic web, as currently conceived, might just be all wrong. His basic point is that it's far cheaper to vend high-level apis than give access to the data (because the cost of shipping large amounts of data around is prohibitive). Since the semantic web is basically a data web, one wonders: why doesn't Gray's argument apply?
Here are three possible counterarguments:
My point? In everything I've read about the semantic web, nobody's addressed Gray's implicit question. Have I missed a large section of papers? Is it obvious that one of the above three arguments is the "killer rejoinder" to "vend high level APIS, not data"? Is the semantic web really about APIs (and I just missed it)? Or is there a crucial hole in the roadmap to the semantic web?
Comments on this weblog
1 to 7 of 7
2003-09-23 01:13:04 anonymous2 [View]
2003-09-22 21:25:35 anonymous2 [View]
1 to 7 of 7
Return to weblogs.oreilly.com.