Parascale Promises Data Center Heaven: Private Cloud Storage At About $1 A Gig
by Erick Schonfeld on September 7, 2008

Web applications require a lot of data storage. All the videos uploaded to YouTube, for example, are estimated to take up more than 500 terabytes of storage.  Google’s servers overall process one petabyte of data every hour or so.  Google had to create its own Web-scale file system to handle all the data that it processes and stores. As Web-scale computing and the needs of plain-old enterprise storage grow, many more companies are wishing they had a file system like Google’s.

Monday, a startup called ParaScale is launching a private beta of a commercial-grade storage software that uses a similar approach to Google’s own in-house system.  (ParaScale nearly made it into TechCrunch50 this year, but was just shy of making the cut, largely because it was no longer in stealth mode).  It offers a file system that can run on a cluster of any off-the-shelf Linux servers.

Companies can keep adding as many servers as they need, with each one acting as a redundant node. The software runs on the cluster as whole, treating it as one giant file system.  This creates private cloud storage that companies can host themselves inside their own firewalls.  ParaScale CEO Sajai Krishnan says customers can expect to pay about $1 per gigabyte, depending on their server costs.

That compares to 15 cents per gigabyte per month from Amazon’s S3 Web storage service, not counting what customers pay for inbound and outbound bandwidth.  After about six months, a customer would end up paying more for Amazon S3.

But ParaScale’s private beta won’t be available to all comers. Krishnan is looking for about 20 initial beta customers (above the five alpha customers who have already been trying the software for the past 18 months) with serious storage needs. His ideal customer is:

. . . somebody with 30 terabytes of storage, growing at 10 to 20 terabytes a year.  If you don’t have that, go with NetApp and you will be pretty happy.

Customers can apply for the beta trial here, and those that get the first four terabytes of storage management for free.

The kinds of applications that make sense for ParaScale include video hosting, applications that crawl the Web and create huge log files, or corporate databases that are simply getting out of hand. Maybe an enterprising enterprise customer will use ParaScale to set up its own storage cloud service to compete with S3.  ParaScale itself is competing against Amazon and RackSpace on the hosted cloud storage side, and with storage appliance vendors such as NetApp and EMC on the data center side.

Architecturally, its approach is closer to Google’s MapReduce file system or Hadoop, the open-source version of that.  Krishnan doesn’t rally see these as competition.  He says:

They are okay for top-ten vendors who have the horsepower and Stanford grads to tune these things.  It takes six months for a Google engineer to figure out the MapReduce mechanisms.

With ParaScale, one IT administrator can manage hundreds of server nodes running Parascale’s software.  ParaScale raised $11.4 million in a series A round last May from Charles River Ventures and Menlo Ventures.

Advertisement

Comments rss icon

  • Seems to be a cheap alternative to my current data center!

  • Damnit why does this have to be a private beta

  • The image isn’t getting displayed in the post(the back-end code is showing up)…See to it Erick…

  • seems a good option… probably try it out..

  • Uh.. Lustre from Sun is free. There are other free software alternatives for Linux as well, I just think Lustre remains the best.

  • @ Eric : You need to replace the ” with ” :)

  • Erick,

    We recently launched IDrive Open Storage Service (http://www.idrive.com/oss/ ) just last week that offers S3 like cloud storage with native webdav protocol support and at price points slightly lower than S3.

    We plan to add native NFS and CIFS support to this shortly and make it a truly open cloud storage where enterprises can simply extend their existing infrastructure with access methods they already know.

    We manage multiple petabytes of data for our online backup services IDrive and IBackup, so we do have the experience in managing large clouds.

    just thought this might be relevant for this post. We are not a startup, but this is a new product that fits with our storage centric strategy.

    Raghu Kulkarni
    CEO
    Pro Softnet Corp

  • it’s not fair to compare managed services like S3 with self managed, no matter how sophisticated the self managed software. That said, I think it’s about time S3 cuts some charges, especially on the bandwidth side.

  • Wow, this sounds really interesting. Right now, here at Convos (http://www.convos.com), we use Amazon EC2 and S3. Later on though, we’re thinking that might have to change depending on how many online groups get started. There have been a lot of EC2 quirks for us.

  • It’s not the cost of storage that gets you (even current S3 prices are essentially free), it’s the data transfer fee that is key.

    • That really depends on your application. For example, in video, for serving the files you want cheap bandwith, but for storing the original material, you mainly care about storage cost, since it’s accessed so infrequently. I guess they are different problems, and most providers are looking for a middle ground.
      You could go either way: Really cheap storage, expensive access (long tail like), or expensive storage, cheap access (top 10 like).

  • In terms of scale, 500 TB is not that much any more. In my day job, I routinely (more than 3 times this year) help build out multiple petabyte systems. The scaling challenge is about the number of hard drives and the access times. 500TB is usually 500 1TB hard drives, and can be done in a single disk subsystem from most any major vendor.

    Now if you’re trying to run something like Oracle on this storage, 1TB drives won’t cut it. Oracle has a multi-petabyte install in their main datacenter that uses smaller (but faster) drives.

  • for one the picture in this post is still broken – others have already noted that days ago. But, even worse, is this sentence: “ParaScale CEO Sajai Krishnan says customers can expect to pay about $1 per gigabyte” per month? per year? for the software license? Including what exactly – the cost of the servers? energy cost? the tech support? the software license?

Leave Comment

Commenting Options

Enter your personal information to the left, or sign in with your Facebook account by clicking the button below.

Alternatively, you can create an avatar that will appear whenever you leave a comment on a Gravatar-enabled blog.

Trackback URL
Short URL
bugbugbugbug
Techcrunch on Facebook