
Cloudera has seen some pretty amazing growth for a year-old startup. Backed by an impressive list of investors and advisors and run by a team of experienced technology veterans, Cloudera commercially distributes and services Hadoop. It’s similar in theory to Red Hat’s distribution of Linux. At tomorrow’s Hadoop World: NYC, Cloudera is announcing “Cloudera Desktop” a unified graphical user interface for Hadoop applications that includes tools for job and cluster management. This is significant because Cloudera is transitioning from providing a service to distributing an actual software.
Hadoop is a Java software framework born out of an open-source implementation of Google’s published computing infrastructure which is fostered within the Apache Software Foundation. Hadoop supports distributed applications running on large clusters of commodity computers processing enormous amounts of data. Cloudera helps distribute Hadoop, and provides services around the technology. Cloudera’s newest Desktop software lets developers, analysts and administrators submit jobs, to monitor cluster health and to browse the data stored on a Hadoop cluster. Basically, helps business teams manage and monitor applications that store data using Hadoop.
Cloudera Desktop runs inside a Web browser, and works on Windows, Macintosh and Linux systems. Applications within Desktop include a file browser, for copying and browsing the data files stored on a cluster; a job designer, for creating, running and saving jobs for later reuse or customization; the job browser, for keeping track of job status and progress; and a cluster health dashboard, for monitoring the health of a Hadoop cluster and alerting operators in case of problems.
Via Cloudera, Hadoop is currently used by most of the giants in the space including Google, Yahoo, Facebook (we wrote about Facebook’s use of Cloudera here), Amazon, AOL, Baidu and more. To date, Cloudera has raised $11 million in funding from Accel Partners and Greylock Partners.









Myarmar’s Aung San Suu Kyi has just signed on for a reality show on NBC.
MapReduce paradigm is a little bit primitive one just sequential execution of two(three) primitives: map-(combine-reduce) Lets wait when Microsoft opensource their Dryad.
Peace.
Hey, Cloudera gurus, how can I run (map->combiner) in a loop? I want to aggregate Map results then pass it to the same Map function again and repeat this until some condition met.
no answer. MapReduce is a big scam. Using thousands commodity computers for problems which can be solved by a a couple dozen of I/O and software optimized servers (not supercomputers) with a RIGHT cluster software is a kind of achievement, yes?
You can try a more general solution, such as PiCloud (picloud.com), which can pull off what you want.
Use Cascading (http://www.cascading.org/), it’s an API on top of Hadoop that lets you create and execute MapReduce jobs.
You can write a loop in standard Java code to create whatever jobs you need and let Cascading figure out the dependencies between the jobs for you.
Marc
Check out the official announcement over at the Cloudera blog. There’s a nice little screencast: http://www.clou...oudera-desktop/
Jeff- r u going to ask them to correct the article’s last section? it would be bad if misinformation about Cloudera’s contribution got out, right? Nobody woudl want that.
Funny that this reads so similar to your previous article! Self plagiarism
http://www.tech...op-integration/
Your last statement is obviously false:
“Via Cloudera, Hadoop is currently used by most of the giants in the space including Google, Yahoo, Facebook (we wrote about Facebook’s use of Cloudera here), Amazon, AOL, Baidu and more.”
Please think before you write.
Hadoop is obviously successful, but are all those users really Cloudera customers? I think there is a bit of poor fact checking going on here. Hadoop is available under a permissive license and just grabbing the list of Hadoop users and saying they use it ‘via Cloudera’ is weak.
I think Cloudera helps mantain Hadoop….. so, it is via Cloudera….. or am I wrong ?
Im write…. actually Cloudera develops Hadoop….. so, if anyone uses Hadoop, it is via Cloudera. The users may or may not by PAID SERVICES…..
WOW….. “Im write” LOL
Im right.
Sorry…. english is like my third language.
You clearly have no idea what you are talking about. Look at contribution to Hadoop on apache. Cloudera’s contribution to hadoop is minuscule compared to Yahoo and Facebook. I see them more as packaging and support company built around Hadoop.
Anderson, you should be more ashamed about being an idiot than about the spelling errors.
http://hadoop.a...he.org/who.html
Org # PMC members
—————————
Agmlab 1
Cloudera 2
Facebook 1
Getopt 1
INRIA 1
Powerset 2
Yahoo! 9
Is there supposed to be a story here, or is this a paid post? I’d like it if when you guys do a paid post you’d put something in the top that says “Supported Post” or something like that.
The link to the Cloudera Desktop is http://cloudera.com/desktop
Congrats to Jeff and the team at Cloudera!
I worked with Amr Awadallah at Yahoo and if that guy is driving, you definitely want to hitch on your wagon. It should come as no surprise to anybody that worked with Amr that the company is having the success it is. Way to go Cloudera and way to go Amr!
Wow, this article couldn’t be further from the truth. Yahoo! contributes the vast majority of code to Hadoop. Cloudera is way below on the list of contributers, although they are trying to position themselves as the owners of the project. Yeah, they created RPMs for Hadoop. Wow. What an achievement…
I agree with Gustavo. This looks like a paid post.
The article really needs to be updated. As pointed out, the final paragraph is factually inaccurate. As such, techcrunch should update the article and include a note as to the correction, as any journalistic organization would do. That is, unless Techcrunch wants to continue the appearance that it doesn’t actually have writers, just little perls scripts that convert press releases into postings on its website. Is Leena Rao a journalist, flack or python script?
I don’t think any of those named clusters use the Cloudera distro -why would they? They need to be able to push out their own patches to their clusters.
the main contribs to the codebase by Cloudera have been Tom Whites EC2/S3 support. Only Amazon would be using that feature.
Regarding the list of PMC members, Doug Cutting was at Y! until last month, so all his code was written at Yahoo!.
This doesn’t mean that the team’s desktop isn’t lovely, but the main contribution that Cloudera have been doing is marketing and taking credit. They want to be the RedHat for Hadoop, but it’s Apache Hadoop, and if they keep pissing everyone else off -especially Yahoo!- someone is going to add their own RPM distro to the base packaging.
Nice post! but you fogot about mazecore – http://monitor.mazecore.com