Google App Engine Goes Down and Stays Down
by Michael Arrington on June 17, 2008

Google App Engine, which launched in April to compete with Amazon’s web services unit, has been having major problems over the last day. Currently, the application directory and, more importantly, all third party applications (here’s our test application), are offline. Developers cannot even log in to the management console.

Google hasn’t responded yet to a request for comment, although they did post a brief message on the Google Groups site for App Engine:

We’ve experienced several outages during the past 12 hours, the most recent of which started at 6:30am PDT and is still ongoing. During these outages, a significant percentage of requests resulted in errors. The errors are related to usage of the Datastore. We’re working hard to determine the cause of these outages and will continue updating as we make progress.

Nothing on the Google App Engine blog yet.

Stuff like this tends to make developers nervous about adopting a new platform.

Update: It’s back up!

Advertisement

Responses

Comments rss icon

  • This is one of the things that has held back big companies to use it as a everyday tool. Security being big, but reliability even bigger, since it’s more likely to happen.

    Here’s an interview with Rishi Chandra (Product Manager for Google Enterprise) talking about how nice Google Apps is, and how they are planning to market it towards corporations – I’m guessing this isn’t part of that ad campaign.

  • So much for flying the friendly skies with Google. Hopefully their corporate jet is more reliable.

  • You’ll never find a hosting solution or application platform that doesn’t experience some downtime (especially in beta). That’s the nature of technology and the Internet. The real question is whether you think your engineers and IT staff can build something more reliable and respond to issues faster than Google (or Amazon, or Yahoo, or Cloud Provider X). For most companies, the answer to that is no – even with great engineers you’re not going to be able to build something as scalable and reliable as those companies. It doesn’t mean they’ll be perfect, but I’ve yet to see anyone achieve that bar.

  • I’ll agree with Dawe.

  • This is uncalled for! I don’t expect this to happen with all of their “smart” employees behind their doors.

  • yeah, I am considering developing a new app on this platform, I still have some doubts about my future independence, or monetization, but It didn’t come to my mind that google’s infrastructures could go down that hard…

    I’m very concerned, and I’m expecting a very touchy communication about this outage.

  • Oh noes! Steve Gillmor is going to be indignant and long-winded this coming weekend.

  • @ Dave Wright: Absolutely, every system goes down – but Google lives to a different standard. Besides the type of applications they are offering – under normal circumstances – don’t require the internet connection. Which makes the down time even worse.

    And it’s not free anymore, companies are relying on it.

  • Although I agree with Dave and sanoittaja for many circumstances, there are cases when controlling the infrastructure can be attractive. First, if you derive some competitive advantage by having it in-house (speed, specialized/targeted services, etc). Also, sometimes there isn’t a substitute for owning your own destiny. But it distills down to a standard outsourcing decision.

  • Michael, kudos on the smoking engine graphic. Simple and to the point.

  • The problem is Google is one of the worst companies when it comes to downtime. I am very surprised they even mentioned a problem, anywhere. GrandCentral would go down for hours, voicemails would be lost, and Google would barely even acknowledge it and would never give any indication they were even working on the problem.

  • I was about to ask if App Engine was still in Beta.

    A quick look at the site doesn’t reveal its current status – maybe they should have kept all those “BETA” “Experimental” etc. banners around a bit longer :)

    It’s still IMO a very cool idea, and I think an idea worth playing with. Enterprise adoption usually only happens when infrastructure is much farther along the stability curve anyway.

  • Give it some time, it will stabilize.

  • This is probably somehow related to the Firefox download fiasco.

  • I’d certainly agree that Google, or any platform provider, should be held to a higher standard than an individual website. After all, any failure on their part will cause hundreds or thousands of sites to go down. That said, I’m not surprised to see some downtime this early in the platform’s life – in fact if this is the first real downtime that’s pretty remarkable in itself. Long term their availability will far exceed what the developers using the platform could hope to accomplish on their own, especially when scalability is considered. There may be other reasons not to use it, but I seriously doubt reliability will be one of them… and if it is there will be other cloud application hosting providers waiting to jump in.

  • Dave – the problem is when it’s a 3rd party service and it goes down you’ve got no recourse or at best an SLA, and worse than that you will rarely know what happened or when it will be back beyond vague messages like “it’s broken, we’re fixing it”.

    When it’s in-house you can find out what went wrong, fix it, prevent it from happening again and communicate more clearly with your staff and your users. You can also toss some redundancy into the mix to lessen any impact.

    Having said that I expect it is much less likely that Amazon, Google etc will suffer any significant down time versus in-house platforms, but if your service is critical and rolling your own infrastructure is viable I don’t think “the cloud” is such an attractive proposition.

  • this product is sooo bad.

    using apps from google is like having pizza at mcdonalds…dont go there!

  • Ben – that’s true, and having experienced it first-hand I know how much it sucks when your site goes down and you have no control and no way to fix it. The natural reaction is to think that by bringing it in-house you’ll be able to get faster response and better uptime, but in reality that’s seldom the case. Unless you’re willing to spend megabucks you’ll likely end up with more issues rather than less.
    Today few Internet companies outside Google/MS/Yahoo build their own datacenters because even though outsourced datacenters have issues from time to time (that, much like this, you can’t control and can’t fix, and have at best an SLA to fall back on) it’s just not practical for everyone to build and run their own datacenter – it’s expensive, it’s difficult, and it’s not where your company adds value to customers.
    In the next few years I think we’ll be saying the same thing about Clould Computing and Cloud Application hosting – the platforms available may not be perfect, but they are better than what 99% of developers could build and the time and money not spent re-inventing the wheel will lead to better and faster delivered products.

  • That’s why web apps, in my opinion, will never replace desktop apps. You can’t expect the same reliability as you do from your desktop computer. Your internet provider could fail or your application provider could fail.

    There are just too many things that could go wrong, where as with your desktop, you only need to worry about having electricity.

  • If you look at the link I posted, the PM for Enterprise Solutions talks about the offline version and of compatibility issues.

  • Google offers an open and flexible framework. You can write your application using this framework and host somewhere else, if you are unhappy about the reliability. So if you think you can provide a better uptime than Google, then go ahead.

    I suggest the non-technical people can whine someplace else (the site is called TECHchrunch after all) or at least familiarize themselves with technology first before bashing it.

  • Any of you notice there are no jobs in the bay area? Very few anyway, a lot of things won’t be coming back.

  • What do you think of Zoho? can it stand the pressure of Google?

  • Maybe I am confused or came upon this story late? Google Apps is working fine for me and http://appengin...crunchbase.com/ seemes to be working just fine as well.

    Did I miss the outtage? Granted we are treating Apps as a Beta and don’t rely on its use in our company, however I have seen longer downtime with larger bandwith providers.

    Just curious

  • How long was the outage? Back up now as I read this (2:45 PDT).

  • Probably it only lasted a couple of minutes (if even that). I was using MS Office anyways.

  • is facebook down as well???

    yikes!

  • facebook is rough for me as well

  • In regards to Gandalf’s comments about desktop apps, that would be true if your app only requires local data to your machine. And how often is that true these days? I think for most large apps are going to follow the client/server architecture idea, all things like this do is move the server to a 3rd party host. You will always have the added complexity and increased risk when you add multiple nodes into a tech equation, and have to depend on the communication mechanism between them.

    All apps will have downtime. Like everyone says, the secret to success is going to be in how the company responds. So far it sounds like the app was experiencing problems for over 12 hours, and it also sounds like they are not able to communicate effectively with their clients and end users about the problem. Time will tell if this is a one time mistake on timing and communication or a recurring thing. That is what will kill or save a company with these kinds of issues.

  • Is it all related to the Firefox 3.0 D/L? Facebook experienced some problems, google apps.

    Did google buy facebook? =P … we know Mozilla and Google are working together.

  • App Engine had about 7 outages in the past month, counting the emails I’ve got from their mailing list. A lot of them are “small number of requests randomly failed” kind of outages, not complete outages. Also the service is in preview mode, not even beta.

  • Ms Office and all non web-based apps for the win. Happy to know that techcrunch and the blogosphere finally criticizes google rather than being ho-hum and calling google all perfect and innocent, though I’ve realized that techcrunch has been bashing google all week. thumbs up for that.

  • Stop crying , site will be back up

  • I don’t actually find this to be news – if you really want to continue writing “news” on App Engine outages please follow the *public* Google App Engine Downtime Notify Group, where each downtime is being notified and motivated: http://groups.g...me-notify?hl=en

    The reasons on the outage mentioned on the same group: http://groups.g...2ded70755?hl=en

    I wonder: why doesn’t the article mention the existence of this group and that from time to time downtime notifications are being posted there. Given the way it is written now, it sounds more like “Oh my God! It is down now for the first time in history…”

    Quote: “Stuff like this tends to make developers nervous about adopting a new platform.”

    Developers should expect this from any platform. If anyone would implement such a platform, how do you think they would test it? You will need tons of various apps, all sorts of queries, tons of users, environments and so on. Let me know when you find a company being able to provide such kind of testing to a large scale system, BEFORE a release.

    Also, developers should learn that each advantage must be earned properly – this includes having off-the-shelf scaling applications. You can’t develop a multi-tier system and expect all tiers to be working perfectly. All errors should be handled by the calling tiers and the application should fail gracefully. For App Engine applications, Google even provides short instructions about handling Datastore exceptions: http://groups.g...befd39424?hl=en

    From my point of view, today’s developer can eiter:
    1. don’t use any new technologies, because they are Beta/Preview/etc and wait and see later what can fail, not even reports bugs (sounds like some offline application, right?)
    2. test now and design properly, contribute knowledge, deploy wisely on the finished product (sounds like web apps, right?)

    Also, for those who say that “offline applications are best”, think twice about how long does it take to fix a bug you discover in the offline application :) Oh, did I mention that lots of other people will find the same bug, WITHOUT even reporting it?

    Welcome to the user-oriented development and testing era which, for a very good reason, overlaps with “Web 2.0″.

  • Twitter is not alone, apparently.

  • @14 – Maybe Mozilla hosted the Firefox downloads on Google App Engine – that would explain both problems…

  • What about Brightcove’s recent problems pushing out #3 (your coverage mentioned nothing about the days of instability around the rollout), where’s that story? Any CDN outages recently? All these things are tools for making businesses happen. The business needs to invest in tools in a way that reflects their importance… so yeah, any business that runs entirely on GoogApps or Amazon’s cloud better be able to operate if those things go away.

    The cool thing about these freebie-services is the way they lower the bar for getting an idea off the ground. I think this is a healthy and good thing for all of us. The challenge for the promising ideas is getting them fit for mainstream usage. Call in the pros when it’s time.

    CG

  • My app might have been down but nobody has cared in about a month :)

    http://qthrul.appspot.com/

    My guess is that eventually there will be diagnostics available as part of your general dashboard on the laptop or mobile or browser.

    Most of the cool kids in larger scale places had hacks for Apache and other service checks as plugins for AIM or similar resident applications.

    So, eventually the lollipop red/yellow/green will make it to some component of the day to day application that indicates the relative health of arbitrary services.

    Hopefully this will become a standard such as http://status.s...nynamehere.com/ and depending on the URI you get a particular real time service stub via XML that can be grocked and digested and represented in any of a series of ways.

    As the wise admin once said though… what’s monitoring the monitor?

  • Anybody else noticing brief outages in Gmail and Google Spreadsheet I’m experiencing since the past ten minutes?

  • this is the first post against Google in techcrunch. I had lot problem with gmail also. many of my friends mail id was deleted during the last release.

    Happy !googling

  • Swaroop CH raises an interesting point. There may be a trend story in here, Michael. Google has been having some problems with Google Docs uptime too. Not major. Not unacceptable. But noticeable. And we’re actually paying to use that service. Maybe its no biggie but Google has been making some very aggressive statements about downtime in comparison to enterprise expectations lately. While its true Google Search is literally always available, GOOG has some way to go before it can claim similar QoS across the portfolio.

  • Gmail and other apps are working fine for us – nothing observed by me. I think they have fixed up all issues now and hope it doesn’t happen again.

    sh IT happens though!!

  • linkedin down too! they sky is falling in!

  • Now that it’s back up why no add a poll (appengine based) to your site asking readers about appengine? I love it so far.

  • WHoops, for got to add the url http://www.modpoll.com/

  • This is just beta and what you would expect – hey Amazon took nearly a year before they fellover and people were really relying on them by that stage.
    Let’s hope that this means the big problems are found out during the beta phase!

    sent from: fav.or.it [FID279265]

  • Yes, this is beta SW and you should expect it to be less reliable. But let’s think a bit about how to test such a complex application.
    Existing SW testing tools give you plenty of features to verify how the app behaves in a TEST environment and under synthetic conditions.
    But what about the behavior in a REAL environment? How do you know what is the NORMAL app behavior in a production environment? How to determine it behaves abnormally and do something about it before users are impacted?
    Lastly, how to learn from past failures and apply it the next time you roll a new rev? How to compare the before and after behavior? Till these questions are answered, app development and especially rolling out new revs to production will include elements of guess and prayer.
    The solution is to use the app behavior in a REAL environment as a baseline to make predictive decisions and to create before-and-after scenarios that tell the app developers what is really happening.

  • If you’re looking for a quick introduction to the Google App Engine check out http://www.squi...ogle-App-Engine

Leave Comment

Commenting Options

Enter your personal information to the left, or sign in with your Facebook account by clicking the button below.

Alternatively, you can create an avatar that will appear whenever you leave a comment on a Gravatar-enabled blog.

Trackback URL
bugbugbugbug
Techcrunch on Facebook