April 4, 2008

Source: Google To Launch BigTable As Web Service

Mark Hendrickson

95 comments »

Google may be releasing BigTable, its internal database system, as a web service to compete with Amazon SimpleDB, according to a source with knowledge of the launch. There are also rumors that press is being pre-briefed on the product, although we haven’t been contacted by Google.

BigTable is a highly scalable database system used internally by Google to support over 60 of its products and projects. A source says Google has plans to announce next week that it will make BigTable available to outside developers as a service. Amazon provides a similar service through SimpleDB, a cloud database solution announced in December.

Google started development on BigTable in early 2004 and began using it actively in February 2005. The non-relational, proprietary system was designed internally to fulfill Google’s peculiar need for access to massive amounts of data at very high speeds (millions of read/writes per second). BigTable is based on the Google File System (GFS) and designed for distribution across thousands of commodity servers that collectively store petabytes of data. Services that rely on it include Google Search, Google Earth and Maps, Google Finance, Google Print, Orkut, YouTube, and Blogger.

The decision to open up BigTable would seem to mark Google’s challenge to Amazon Web Services (AWS) suite, which also includes the Elastic Compute Cloud (EC2) for cloud processing power and Simple Storage Service (S3) for cloud storage. The Amazon triumvirate of SimpleDB, S3, and EC2 is meant solve the scalability needs of web developers with a utility-like model. Customers pay for just the storage, computations, and bandwidth they need, and none they don’t. While Google has yet to announce the pricing for BigTable, we presume it will share the same model as AWS.

If Google does indeed announce public access to BigTable next week, expect the company to follow up with cloud storage and processing solutions as well, since there are substantial synergies between the three.

For more information about BigTable, see a paper (PDF) that was written about it in 2006. You can also watch a talk about it given at the University of Washington in October 2005.

  • Sphere It

Trackbacks/Pings (Trackback URL)

Comments

I’m not sure I understand how providing something remotely would be faster/better than doing a localhost setup?

 

BigTable? Sounds more like GiganticTable. (plus, that has a G in it.)

 

This is a serious question, not a criticism: what is the difference between ‘the cloud’ and a content delivery network like Akamai?

1. Does Akamai only serve files, but not process scripts? Does something like BigTable only process scripts but not serve files? This has gone way over my head, and I haven’t bothered reading up on it, but it looks like I need to.

2. Where will this leave the smaller hosts like 1and1 and Dreamhost? Will hosting in the future just combine something like the cloud with a content delivery network and package it together?

I’m hopelessly confused.

 

You forget SQL Server Data Services from Microsoft which is also in the same business of cloud storage

 

@Don Wilson

Better, because very few people have the ability to scale or provide uptime like Google. If Google does eventually come out with other cloud solutions, everything will run and communicate locally… at Google.

 

I work at a company that makes extensive utilization of Amazon Web Services, and we have been wondering when Google, Microsoft, and/or Yahoo would make a move in this space.

While it is exciting to see more web enterprises enter the space, presumably created healthy competition and pricing pressure that will benefit consumers of the services, I’m curious as to whether Google will succeed.

One interesting ingredient in Amazon’s success has been their commitment that they are not interested in building upstream products and that they are primarily concerned with providing the “undifferentiated heavy lifting” of raw processing, storage, and structured data. They are sticking to selling books and other goods, and not getting into consumer or SMB facing software-as-a-service businesses - exclusively involving themselves in the back end. Google, on the other hand, is constantly releasing consumer facing web applications and businesses. This puts them in the awkward position of engaging in competition with the very companies that would use their infrastructure-on-demand services.

Will their entry create conditions that will encourage innovation, or will the ever broadening projects serve to stifle innovation by scaring creators and investors away from projects that Google “might” undertake?

 

mark my words: Microsoft will either buy eBay (to get PayPal) or Amazon (to get 1-Click & AWS + distribution / infrastructure) sometime in the next 5 years. maybe less, if the Yahoo deal doesn’t happen.

might have to kill Jeff Bezos or make him CEO in order to do Amazon, but probably worth it.

 

What if Google makes it free? They make practically everything free. Why should we assume they will charge for it like Amazon does? Free will beat Amazon, competitive pricing will not. Providing this wouldn’t significantly cost anything, since Google is already running it. Once the interface is built, that’s it. Am I way off base here, or does that sound plausible?

 

I wonder if they will also offer GFS (Google File System). Currently SimpleDB has too many limits to use it for anything large.

Dave Winer mentioned ”
Why would Google Web Services cost $0″ a few days ago. http://www.scripting.com/stori.....icesC.html

 

Mark, pretty sure the price will be $0, with some ceiling that keeps other infrastructure vendors from reselling Google’s service.

An exciting development for sure!

 

I suspect BigTable will compare to SimpleDB the way MS Access compares to MS SQL Server… at least an order of magnitude more powerful.

@Dan: I doubt Google would offer GFS as a standalone service. I’m pretty sure BigTable runs on top of or in conjunction with GFS, so indirectly you are getting to use it.

 

good luck getting support from these guys.

 

@ #10 Dave and #9Dan

How does this square with the fact that Google currently charges users of Google Apps for storage over a certain amount?

If they aren’t even offering unlimited storage for them, how could they contemplate offering unlimited storage with programmatic access?

For Dave specifically, while it might make sense for Amazon to offer the services to you for free (how much are you consuming?), Amazon is definitely servicing companies for whom it would make no sense at all to support for free. See SmugMug, BeInSync, ElephantDrive, and others for examples of this. Do you envision supporting firms like these for free?

 

i think that Google is trying act like Microsoft want to do every thing on the web

can make one regarding the Google adsense program
Google getting money for displaying ads and for other things but the publisher is not getting any thing for displaying ads he get money when some click on the ads while Google is always earning money some one click on the ads or not

 

It is great that there will be some actual competition to Amazon in this area. I like, however, what Amazon has done. I wrote a blog post a while back regarding my decision to use EC2 and how I would use it (link below). Although, I will be giving Google’s cloud offerings a look when available.

http://www.fabianschonholz.com.....-solution/

 

This would be quite cool if true.

 
 

Mark,

“Google may be releasing BigTable…according to a source with knowledge of the launch”

What exactly did your “source” say? The above statement seems to indicate that he knows about a launch that might be BigTable, or it might be something else. Is that what you’re trying to say?

-Ramone

 

I’m happy about this. I hate amazon and love google.

 

Good to hear but Why its BigTable?

 

I’m with Winer on this one. The Google way to do this is make it a Gmail-style release, and blow by all the competition with a product only a behemoth can offer. But unlike Gmail, there’s no clear way way to serve ads via a db service. It could be just a data play. Although when the concept of serving ads via scanning emails was pretty ridiculous at the time….

 

Mind-blowing!
This will be a real threat to AWS. AWS is great, but it services are not really scalable, as I read somewhere (i don’t use it). As Henry (@21) says they will present it in a manner that is going to make Amazon guys work days and nights.. Like what it did to Yahoo mail with Gmail…
Sunil (@21) it is BigTable because it has the its the father of all tables :D

 

I think the BigTable is good service,thanks to share this article.

 

I wonder how different is BigTable from Google Base available today? It reasonable to assume base is one of the 60 products or whatever running on top of BigTable. It’s already a (potentially big, how big is yours?) table, with user-defined fields, support for nested collections, and a proprietary query language (not unlike SimpleDB). GoogleBase comes with some predefined complex data types which is pretty cool. I wish those could be published and shared. enough.

 

I hope the make it free. But don’t really like google having all this information. Guess google needs innovation since their current revenue model is now full of more holes than swiss cheese , and their stock price is dropping since the dreamy days.

 

Interesting to see where this takes the cloud computing industry. So far Google’s been more into APIs to connect some services to external apps, but not data storage online. However they perform in the market, the competition will certainly drive this business far.

 

So , If I have an environment where My application runtime is on AWS EC2 and I use Bigtable as the backend. Does it make any sense ? Probably not , as the performance improvements I expect from big table will potentially be lost.

So , the only way this announcement can make sense is if Google also release GCloud.

 

“AWS is great, but it services are not really scalable, as I read somewhere (i don’t use it).” => Sorry but you don’t know what you’re talking about here. The whole point of AWS is to scale.

 

No serious app will run using this piece of shit.

 

We use Amazon and one thing they do have is great customer service. They are truly passionate about AWS and about helping their customers. Don’t get me wrong, I have always been a fan of Google, but my loyalty will remain with AWS for their pioneering approach. Google seems to want a share in every pie and my experience (Adwords, Adsense) of their support would be enough to make me steer away from their offering. We run a service via AWS that scales sometimes to 100 servers and back down to 20 with ease and there’s a great bunch of guys to help you out both in Amazon as well as in the general community. Competition is good sure, but I know who I’ll stick with for now

 

Nice offering. Remember to improve provisionings in China :-)

 

As exciting as AWS’s cloud offerings are, a well-executed play here could be pretty disruptive. SimpleDB is really being bootstrapped right now, with a lot of limitations in place as well as eventual consistency to worry about. If BigTable is schemaless and scalable like SimpleDB, but with instant consistency and fewer restrictions, it’s a winner.

 

@AJAY

No, if nobody clicks on the ad, Google doesn’t get paid either. From
https://adwords.google.co.uk/select/Login :-

There is no minimum spending requirement or time commitment. And with the cost-per-click option, you are only charged if people click your ads. This means that every pound of your budget goes toward bringing new prospects to you.

 

IT will be interesting to see if Google can make ‘Amazon’s Elastic’ look a little ‘slack’. The Cloud must scale, being a little philosophical I would argue it should be almost dimensionless, whilst recognising that we always have practical limits. Scalability, responsiveness, functional richness and price will be the essential ingredients of success and Google are (naturally) well placed to deliver all. If they can keep ‘teething problems’ to a minimum it should be a heralded winner.

 

Overall AWS is pretty impressive. We have almost 1 million images stored on S3 and use EC2 for a lot too (22’s comment about it not scaling is way off base). SimpleDB blows though. It’s really a joke at this point though I expect it to improve (nowhere to go but up). I love the idea of Google launching BigTable as a web service. If they simultaneously launch a storage and processing solution Amazon has a lot to fear…

 

I think it will hard for google to compete with Amazon.

 

Another user data aggregating service from a monopolist.

Another “Thanks, but no thanks” from me.

 

now if they do pricing on terrabtyes not giga then they could own this space :p

I will look at this if it comes at the right price. I also have my eye on the MS offering but I have a feeling their pricing will be high or it will be a lame duck because they don’t want to hurt sql server sales.

 

Cool. Apache Hadoop users can now choose between Hbase, HyperTable and BigTable(online).

 

#1 @Don: Because you don’t have to do the localhost setup, because you don’t have to do most of the maintenance, the replication and failover, the capacity planning, the upgrades, it’s faster for your users to get to from around the world by coming through their local peering point into a large well-maintained network, etc.

#3 @Blowski: Akamai is part of the cloud. Akamai provides a bandwidth service, more specifically caching at the edge (near the users) for services that otherwise don’t have the presence next to their users. The cloud also includes computing, storage services.

1. Akamai serves copies of files. Cloud storage services can be many things, some serve files (S3), other scripts (many hosting services today), or a scripting language with an an API to a persistent storage backend.

2. In the near term, little effect, since there is still a market for machine hosting. In the long term, with virtualization of one type or another, they will all shift to a more service-oriented delivery model, and at a higher level of service abstraction (more API’s, packaged solutions, and less raw compute or storage bits). I think your last statement is actually quite insightful, only I’d put the CDN in as part of the cloud itself.

#6 @Michael

Encourage innovation, by far. As only one example, there are plenty of industry vertical applications, e.g., accounting for the legal profession, that Google could and would never focus on, when it’s focussed on getting the broadest use cases possible.

#9 @Dan: BigTable sits on top of GFS [this is public knowledge]. IMO it would be far more valuable for Google to provide services at a higher abstraction (API) layer, including higher than BigTable, than to provide yet-another-raw-bits-in-the-cloud service.

#11 @CBass: see above.

#18 @Ramone: more likely something else

#24 @Vasili: Base is a higher-level service, providing uploading, indexing and searching service for structured data. BigTable is a lower-level generic database, providing (intentionally) limited relational capabilities but very large-scale distributed storage, and purely programmatic access.

#29 @JosephVirek: you comment sounds either (a) completely random, or (b) from an Xoogler who’s had an unfortunate bad run with BT, in which case I’d suggest you consider the possibility that it works really well overall and how robust & pervasive it is. If the latter let’s talk off-thread.

 

It’s pretty easy to see where this is heading: The Google platform will be built out to incorporate cloud computing and all of the other components needed to host and scale a web app. Google will then have a piece of the action at every point:

- Driving traffic to the site (Adwords)
- The hardware and software infrastructure (BigTable and other services)
- Reporting (Google Analytics)
- Maximising conversions (Google Optimizer)
- Taking money (Google Checkout)
- Monetizing page view (AdSense)

Google will then extend this further with pre-built application design patterns and templates. I can imagine a web interface where you’ll be able to assemble your application online, then use all of the Google services above to turn it into a business. Then it’ll be a piece of cake for Google to acquire any applications that become successful because they’ll already by in the Google house style and hosted on their infrastructure.

This could be incredible for the web start-up community, but the prospect of Google having a monopoly in this is worrying.

 

Google is pursuing their vision to organize (and control) every bit of information out there, does not matter what it is. As they become more controlling, they will be in competition and direct conflict with many other businesses, including the startups they help.

 

@4 Michael,
Either it will scare people off for fear of competition with the Big G, or make people think that this makes it more likely for Google to acquire them, since the back end already works with some Google services. Not that it really matters or is ever that simple.

 

“Google may be releasing BigTable, its internal database system, as a web service to compete with Amazon SimpleDB, ”

One problem: BigTable is not a “database system”. It doesn’t offer database semantics. It’s just a really f-ing big table.

Arrington would know that if he actually performed the minimal amount of due diligence and read the first paragraph of the BigTable paper.

 

@44 Anon: It’s not incorrect for TechCrunch to refer to BT as a database. I understand there are those of us who grew up in a fully-formed xyz-O-R-DBMS world who consider anything less not a database. And there’s a group of cells in Excel being called a database. Somewhere along that spectrum we can all draw our marginot line.

BT does provide keyed access, indexing, etc. The authors take pains in the paper to stay out of the religious wars of “how ‘relational’ are you really?”, instead emphasizing that the design is fit-to-purpose, lack of bells and whistles is intentional.

At another end of the spectrum, it seems MapReduce was recently accused of being an RDBMS (http://www.databasecolumn.com/2008/01/mapreduce-a-major-step-back.html)

 

Over at my blog I wondered if this might be Google trying to discourage people from developing Hadoop and HBase further, at the risk that those OSS packages might one day enable Yahoo to become competitive again?

 
Slow Dog -- shoot it - April 5th, 2008 at 2:29 pm PDT

One busy group of clusters handles more than 1.2 million requests per second, using more than 8,000 tablet servers.

So each tablet server handled about 150 requests per second. Opterons with dual drives? That low performance (and throw hardware at the problem) is lazy AND bad for the environment.

Google already uses as much energy as a fleet of sports cars driving 17,000 miles a day.

But there’s another issue. BigTable was designed for sequential reads/writes. Getting a random piece of data out? 10x slower.

In earlier papers the designers say their biggest problem was traing Google Engineers to write code that worked well on GFS/BT. That is, even google engineers were writing code that was ill-suited to the specifics of the system design.

Let a few PHP or Ruby (gasp!) “developers” at that and sit back for a good chuckle.

 

This is a “no brainer” for Google and it is guaranteed to happen. Google is quickly growing its DNA in all kinds of ways and they are not likely to cede the database to Amazon or anybody else. They are growing the DNA from the bottom just like Microsoft did. But the bottom up strategy is a Trojan horse for the enterprise. Anybody willing to take a bet that in five years Google will not have a major presence as an enterprise software provider?

 

Just for clarification, Akamai does offer edge computing services which allows for execution of Java based code at the edge.

 

#47 @Slow dog — shoot it

In the terminology of the paper, servers are not machines. A single machine would run multiple servers. In the virtual-computing environment described, this included tablet servers, GFS server, the load-test clients, as well as processes from other services (e.g., web search, adsense) that may have nothing to do with the BigTable service itself. What’s clear is that a server process is only a slice of the pie that is the machine hosting it, all doing their own compute/bandwidth/disk consuming work. Because of this shared computing environment, the authors quantified qps in terms of servers, not machines.

So comparing qps per BT server with qps per, e.g., Apache-PHP-server-dedicated machine is apples and oranges at two levels: (a) the denominator is server processes, not machines, and (b) the usual “are we talking about the same kind of ‘transaction’?” Here the answer is no.

Re random read, if you read the paper more closely, there are two types: (a) those that can be served out of the memory cache, and (b) those that can’t. The slowness is due to having to fetch an entire 64KB chunk just to read 1KB, virtually every time, due to the intentional randomized nature of the load test. In practice, your data is rarely that randomly located, and cache hit rate would be much higher than ~0%, resulting in much faster random-acess read rate. And as mentioned in the paper, if your access pattern is really that much more random, then tuning the underlying GFS chunks to, e.g., 8K instead of 64K would boost read performance by up to 8x.

In benchmarking any system people would try to look at worst-case performance (here, naive configuration, completely random access pattern); in practice you are doing something wrong wrt client design and/or system tuning if that worst case is anything close to even the 80-90th percentile in your op distribution.

There is a variety of storage technologies at Google, Yahoo, Amazon, etc., each most suitable for its client requirements. Amazon’s Dynamo-based SimpleDB is one interesting example of that: clients explicitly accepting the possbility of async inconsistent reads in exchange for higher write throughput and increased reliability. It’s useful not to be too religious or simplistic in these comparisons.

HTH.

 
 

Sorry, the comment form is closed at this time.