Have you nominated someone for a Crunchie today? »
Cloudera And Facebook Shed More Light On Hadoop Integration
by Leena Rao on September 23, 2009

For a startup that was founded less than a year ago, Cloudera has seen some pretty amazing growth. Backed by an impressive list of investors and advisors and run by a team of experienced technology veterans, Cloudera commercially distributes and services Hadoop. It’s similar in theory to Red Hat’s distribution of Linux.

Hadoop is a Java software framework born out of an open-source implementation of Google’s published computing infrastructure which is fostered within the Apache Software Foundation. Hadoop supports distributed applications running on large clusters of commodity computers processing enormous amounts of data. Cloudera helps distribute Hadoop, and provides services around the technology. Via Cloudera, Hadoop is currently used by most of the giants in the space including, Google, Yahoo, Facebook, Amazon, AOL, Baidu and more. To date, Cloudera has raised $11 million in funding from Accel Partners and Greylock Partners.

Cloudera is organizing and hosting a conference, Hadoop World: NYC, in a few weeks to support the growing Apache Hadoop community. Facebook, Yahoo, Amazon Web Services and IBM will all be making presentations about how they use the technology to support large volumes of data.

Facebook is one of the more interesting use cases of Hadoop use, says Cloudera co-founder Christophe Bisciglia. Facebook software engineer Ashish Thusoo said that prior to Hadoop, the social network uses conventional RDBMS based data warehousing technologies and switched to the open-source Hadoop because of its scalability, cost and flexibility.

Facebook implements both Hadoop and Hive, which is a data warehouse infrastructure built on top of Hadoop that provides tools to enable easy data summarization, adhoc querying and analysis of large datasets stored in Hadoop files. For example, the software makes it easy to create business data reports with data, aggregation and analysis that is used to drive Facebook products, model generation and optimization problems for ads.

Cloudera is offering an exclusive discount code to the Hadoop event in New York City, with will knock off 25 percent of the list price of $399 per ticket. It’s valid through 9/29.

Advertisement

Comments rss icon

  • Does Google use Hadoop? I thought Hadoop was an independent implementation of the Google Map/Reduce / BigTable / GoogleFS paper. Why would Google need to use the open source version of their own architecture?

  • @JLW

    Yes, Hadoop is an open source implementation of Google’s MapReduce and GFS papers.

    Google also uses Hadoop to provide a large cluster hosted in one of their data centers to the academic research community in partnership with the National Science Foundation and IBM.

    Using the same storage and computational models internally as well as in collaboration with academic researchers provides a lot of benefits to both parties.

  • Cool, thanks for the info.

  • mike olson has put together an incredible team over there, and i suppose that deep down the pitch is kinda that you can supplant a traditional rdbms with hadoop – i just hope that big orgs don’t try to replicate what those (oracle style) products do within the hadoop framework because i always understood it to be a totally different way to work it…this was quite an interesting topic at SIGMOD back in july (that entire audience is following hadoop at some level or another ;)

Leave Comment

Commenting Options

Enter your personal information to the left, or sign in with your Facebook account by clicking the button below.

Alternatively, you can create an avatar that will appear whenever you leave a comment on a Gravatar-enabled blog.

Trackback URL
Short URL
bugbugbugbug
Techcrunch on Facebook