|
|
|
|
|
April 30 2008
Last July I asked "Why are there no Amazon S3/EC2 competitors?", lamenting the lack of competition in the utility or cloud computing market and the implications for disaster recovery. Closely tied to disaster recover is portability -- the ability to switch between different utility computing providers as easily as I… read moreApril 14 2008
On Friday in Amsterdam there was a lot of Hadoop on the menu at ApacheCon. I kicked it off at 9am with A Tour of Apache Hadoop, Owen O'Malley followed with Programming with Hadoop’s Map/Reduce, and Allen Wittenauer finished off after lunch with Deploying Grid Services using Apache Hadoop. Find… read moreTurn off the lights when you're not using them, please
March 30 2008
One of the things that struck me about this week's new Amazon EC2 features was the pricing model for Elastic IP addresses:$0.01 per hour when not mapped to a running instanceThe idea is to encourage people to stop hogging public IP addresses, which are a limited resource, when they don't… read moreMarch 23 2008
I made this image a few years ago (as a postcard to give to friends), but it's appropriate to show again today as it's a neat visual demonstration that Easter this year is the earliest this century.The scale at the bottom shows the maximum range of Easter: from 22 March… read moreMarch 22 2008
On Wednesday, I ran a session at SPA 2008 entitled "Understanding MapReduce with Hadoop". SPA is a very hands-on conference, with many sessions having a methodological slant, so I wanted to get people who had never encountered MapReduce before actually writing MapReduce programs. I only had 75 minutes, so I… read moreMarch 18 2008
MapReduce is a programming model for processing vast amounts of data. One of the reasons that it works so well is because it exploits a sweet spot of modern disk drive technology trends. In essence MapReduce works by repeatedly sorting and merging data that is streamed to and from disk… read moreMarch 02 2008
There's a class of MapReduce applications that use Hadoop just for its distributed processing capabilities. Telltale signs are:1. Little or no input data of note. (Certainly not large files stored in HDFS.)2. Map tasks are therefore not limited by their ability to consume input, but by their ability to run… read moreFebruary 12 2008
I noticed today that I had an EC2 development cluster running that I hadn't shut down from a few days ago. It was only a couple of instances, but even so, it was annoying. Steve Loughran had a good idea for preventing this: have the cluster shut itself down if… read moreApache Incubator Proposal for Thrift
February 01 2008
There's a proposal for Thrift to go into the Apache Incubator. This seems to me to be a good move - there's increasing interest in Thrift - just look at the number of language bindings that have been contributed: Cocoa/Objective C, C++, C#, Erlang, Haskell, Java, OCaml, Perl, PHP, Python,… read moreJanuary 30 2008
I've always thought that Hadoop is a great fit for analyzing log files (I even wrote an article about it). The big win is that you can write ad hoc MapReduce queries against huge datasets and get results in minutes or hours. So I was interested to read Stu Hood's… read moreHadoop is now an Apache Top Level Project
January 16 2008
Doug Cutting just reported on the Hadoop lists that the Apache board voted this morning (US time) to make Hadoop a TLP. Until now it has been a Lucene subproject, which made sense when Hadoop was broken out from the Nutch codebase two years ago. Since then Hadoop has grown… read moreHadoop is now an Apache Top Level Project
January 16 2008
Doug Cutting just reported on the Hadoop lists that the Apache board voted this morning (US time) to make Hadoop a TLP. Until now it has been a Lucene subproject, which made sense when Hadoop was broken out from the Nutch codebase two years ago. Since then Hadoop has grown… read moreMapReduce, Map Reduce, Map/Reduce or Map-Reduce?
January 13 2008
Although I've seen the other variants (and used some of them myself), Google call it "MapReduce", so that seems like the right thing to call it to me, since they invented it. The usage figures seem to back up this conclusion. "MapReduce" (no space) has 87,000 Google hits, while "Map… read moreMapReduce, Map Reduce, Map/Reduce or Map-Reduce?
January 13 2008
Although I've seen the other variants (and used some of them myself), Google call it "MapReduce", so that seems like the right thing to call it to me, since they invented it. The usage figures seem to back up this conclusion. "MapReduce" (no space) has 87,000 Google hits, while "Map… read moreCasual Large Scale Data Processing
January 07 2008
I think Greg Linden hits the nail on the head when he says of MapReduce at Google:What is so remarkable about this is how casual it makes large scale data processing. Anyone at Google can write a MapReduce program that uses hundreds or thousands of machines from their cluster. Anyone… read more