July 13, 2007

40 Terabytes More Data For Amazon S3

Michael Arrington

32 comments »

Startup Phanfare, which stores a lot of user generated media, announced today that they are in the process of moving all of their backups of stored user data - 40 terabytes - to Amazon’s S3 storage service.

Amazon S3 has been on a bit of a roll lately, recently surpassing 5 billion stored objects and growing fast.

It’s also racking up a number of passionate users who swear by it for reliability and cost savings. Phanfare is just the most recent example, albeit a large one.

Phanfare stopped short of moving all data over to S3, though. For now they are just moving backups. They admit they’d save more money by moving the storage function entirely to Amazon, but note that:

After all, right now, Amazon does not provide a Service Level Agreement (SLA) or even a phone number to call if you are unhappy with the Amazon web service. I don’t expect that Amazon will ever lose our data of course, but we would like an SLA before we bet our customer’s data on that.

On the subject of Amazon, rumor has it that they’ll be adding to their storage and computing web services by year end - and adding a MySQL database web service to compliment the other two.

  • Sphere It

Trackbacks/Pings (Trackback URL)

  1. share.websitemagazine.com
  2. Start-Up Guide
  3. Amazon To Launch Payments Services; Will Compete With PayPal and Google Checkout
  4. Amazon to compete With PayPal and Google Checkout
  5. NexGen Technology Blog » Amazon To Launch Payments Services; Will Compete With PayPal and Google Checkout
  6. Phanfare Relaunches with Improved Sharing Functionality, Free Version

Comments

RSS feed for comments on this post.

  1. Alaska Miller

    I’ll be curious to see how much money Amazon makes on S3.

  2. Andy

    We use Amazon S3 for FLV video hosting. Very cost effective and reliable, even for streamed video.

    The MySQL rumour is very intruiging. I assume this would be using their own EC2. Or is this really about the MySQL interface that independent developers have got for S3 as a plugin for MySQL, storing data from tables in S3?

  3. Evan R

    If Amazon S3 is their back-up strategy, and they are moving “all” their back-ups there, aren’t they in fact still “betting their customer’s data on that”?

  4. Dan

    While EC2 seems ideal for things like map/reduce jobs (using Hadoop), database support sounds like a good idea. I hope they support PostgreSQL instead of our in addition to mySQL.

    Other than in ease of setup (which Amazon will handle) and using it as a simple datastore (quick reads/writes involving a single table), Postgres is better in almost every way. It supports many interesting extensions, executes complex queries faster, supports a larger set of SQL, and has more sophisticated transaction handling. Don’t want a flamewar, just hope Amazon supports it too…

  5. Jeremiah Owyang

    Good coverage. I really believe that the future of Data Storage is heading to this service type model.

    WAAAAY back in 2006, I made some predictions about the future of this market, in this piece ” Future of Online Storage, 40 points to think about”

    Many of them are already coming true.

    What most people haven’t realized is the tacit knowledge in the cloud, and how we can extract that knowledge.

    http://www.web-strategist.com/.....ink-about/

  6. Jeremiah Owyang

    Oh yeah, and I predicted that Online Data Storage companies will PAY users to upload data.

    When I wrote this post the Data Storage Community wasn’t happy with me (Although I was an employee at Hitachi’s Data Storage company)

    Guess what happened? A little while later, YouTube started to consider paying some of it’s top content producers.

    http://www.web-strategist.com/.....load-data/

  7. icepick

    has anybody made experience with S3 on serving content/streams for Europe?..have been told once latency was high and once that it serves quite well..as I am not the guru of techs anybody out there with experience :-)?

    ice

  8. Len

    Andy @2, I’m just in the process of chosing hosting services for a site I’ve been working on, and it as well has flv video. I’d be interested to talk to you a bit further on how it’s been going over at Amazon.

    Please feel free to contact me privately if you feel comfortable with that.
    My email address is len [{ at }] thinkbbk dot com.

    Thanks

  9. Matt

    We’re going with EC2 and S3 integration to run our Flex 2 web-application, Convos. It’s currently in closed testing and we’ve been really pleased with experience and performance.

    This past week in NYC, Amazon’s AWS evangelist presented at the NY Tech Meetup and I was surprised how many people had not heard of their services. He did a demo of an 3rd party software that monitors your EC2 servers and automatically increases / decreases capacity based on demand. Very cool.

    PS #4 - PostgreSQL works with AWS.

  10. Andrew Erlichson

    Evan,
    Our primary servers are RAID 6, which means that they can suffer a double disk failure and not lose data. By definition, backups represent a second line of defense. Even if we lost every backup, we would not lose the primary data. In fact, before we switched over, we did lose backup machines from time to time (they were not the same high-end branded servers we use for primary storage) and we simply replicated the data onto new backup.

    Also, if we used Amazon for primary, and they went down, we would have almost no recourse. You can read the terms of service on Amazon web services. It basically says, if it goes down, tough cookies.

    I actually believe that Amazon was extraordinarily conservative in crafting their terms of service and now that they have some operating experience, they will strengthen the terms of service, or offer traditional service level agreements on a tiered basis.

    In any event, the Amazon web services concept is a great one, especially for the startup community. If was just one guy hacking my prototype, I would definitely see if I can get the whole thing on top of Amazon web services. After all, at that point, you have little to lose.

  11. Darin

    Amazon must be making a killing on S3!

    Darin

  12. Concrete Stain

    Amazon - s3 - brilliant.

    although they started advertising groceries on their homepage again

    - not - so - brilliant -

  13. Peter

    I have been reading about the online backup industry for a while now, mostly from the aggregator website,

    BackupReview.info

    As this site ranks the top 25 online backup companies on a monthly basis, I have not seen even once Amazon’s S3 ranked in the top 25 in the past 2 years. If S3 is that good and relaible, then why is that it did not make the cut at least once?

    It sounds fishy here, and it looks like that the more a company has money to promote its products and services, the more we hear about them, despite the fact that there are tons others with far better products.

  14. Igor

    We at http://www.mediamaster.com are already using Amazon S3 for some of our customers MP3 files and are planning to up the Amazon storage share significantly over the next several months. So far the performance has been great, nothing to complain so far.

  15. Evan R

    Andrew @ 10,

    Thanks for the reply. I hear you on RAID6 as your primary.

    I am curious if Amazon facilitates periodic integrity checks and tests of back-ups (that do not themselves incur significant transfer costs, e.g. MD5 checks on both sides).

    S3 is appealing. In outsourcing critical or back-up systems to S3 I would be looking for comfort that the triple nightmare scenario of “RAID6 failure, back-up corrupt” is demonstrably unlikely.

  16. CrunchBack

    At Digibug (www.digibug.com) we are moving a new application to use S3 for all hi-res images after 30 days (when most printing and viewing activity drops off). We’ll keep screen res and thumbnails locally. At this point we are comfortable that Amazon can manage data way more reliably than we can and for a lot less money. We’re not yet as comfortable with EC2 but may be once a hosted MySQL solution is in place and proven to be reliable - then we can shut down our data center for good and just focus on coding features instead of managing servers.

  17. Frank

    S3 doesn’t seem to make sense financially. Can someone shed some light on how it’s cost effective?

    A 1TB server costs about $1000 (or less) to buy. Expect at least a year of service from that one server, probably closer to 5 years, or longer when you replace the drives.

    S3 costs $0.15 per GB per month, which is $0.15 x 1000GB x 12 months, which is $1800. For only one year. Two years is $3600. 3 years is $5400. In the meantime, your $1000 server is still humming along at 1/5th the cost. And will still be working for a full five years while S3 will have cost you $9000 in 5 years for the same amount of storage that you paid $1000 for.

    The bandwidth costs seem to be about the same, especially if you’re just using this server for a backup.

    Operations costs? Hands-on service at a co-lo can be as high as $200 per hour. So, let’s say a drive goes out once a year and needs to be replaced. You’re still much cheaper than S3.

  18. David Mackey

    Big fan of Amazon S3. Also excited because South River Technologies, makers of WebDrive are extending their FTP drive mapping application to support S3, expected release Q3 or Q4 2007. There are some other Amazon S3 drive mapping utilities out there (S3Drive), but not very mature development wise.

  19. Andrew Erlichson

    frank @17: You are correct. You can build out storage more cheaply. Numbers are not as bad as you say though. You did not consider the cost of co-location and bandwidth for the server. But most importantly, you did not consider two additional things:

    1. you will almost certainly wind up virtualizing that 1TB server if you buy 100 of them. Writing the virtualization layer so that it appears as limitless disk is non trival. Buying a solution (probably a SAN solution, will increase your storage cost a lot).

    2. S3 is considerably more reliable than a single 1TB server. It is more like a single 1TB server and another server or two to back it up. Your single terabyte server is not backed up in multiple locations. S3 automatically does that for you.

    If you compare apples to apple and built a S3, you would be hard pressed to do it for less. Certainly it is possible but your scale would need to be quite large to do it.

    One of the reasons S3 seems expensive for backup is that it is not really backup. It is a reliable store that is also backed up, essentially. Hence, using S3 for backup is a bit of overkill. That is why to really clean up cost wise, you must make S3 be your primary data store. Whether you feel comfortable doing that without an SLA from Amazon is up to you.

  20. Fabian

    40 TBs is nothing - by the time I left the VP of Technology position at Pictage we had 150TBs of high rez images. On the same system I built, they now have about 300TBs.

    People, you can build very inexpensive distributed storage that works better than S3 or EMC or any other possible vendor. Contact me if you want to know hoe.

  21. Andrew Erlichson

    Hi Fabian (we have met),
    It took us about a week to create the S3 solution. Granted, we were able to leverage some existing code we had that calculated incremental backups, but my guess is that the TCO on the pictage system would still put you under water relative to an S3 solution today, even with a 300TB. Anyway, you should share your magical distributed, limitless storage solution. Can’t make a claim like that and then just walk away.

    Andrew

  22. Ben Allfree

    Not sure if this is the right forum, but we recently launched InfiniteFTP this week to provide more universal access to S3.

    InfintieFTP provides FTP access to Amazon S3.

  23. Fabian

    Andrew:

    Hey man … how are you? Scott mentioned something about you guys a few weeks ago. You guys still using the same technology when we last talk?

    The secret sauce is AFS. That is all the magic you need. In my opinion it is not only the time that takes you to create a solution, but the cost of maintenance and mantainability, availability, risk through time, lover external dependencies, ownership and overall performance. I did not say that S3 is not a good solution; I said that there are better off-market solutions that you can build yourself and leverage to increase the valuation of your company in an M&A.

    It is the same old argument of build vs. buy. I am more of the build mentality - but only for core technologies that do make a difference at valuation time. For example, I would not build a general ledger, but in the case of Pictage, I decided to build the image serving technology for two reasons:

    1) Totally core to the company.
    2) Even though there were some good products at the time, they were not cost effective.

    At the end of the day, to this day, nobody has an Image Server as good, flexible, versatile as Pictage (and I built that) and if you do not believe me, look at Pictage’s Album Designer. The imaging back end is all done real time by the Image Server.

    I am no longer at Pictage … good riddance … people there get treated like dogs. But the technology I left behind remains really powerful, including the storage solution.

  24. Frank

    Andrew, thank you for your reply.

    > You did not consider the cost of co-location and bandwidth for the server.

    Actually, I did. If the server is used for backup, the bandwidth and colocation pricing seems to be about the same.

    > Writing the virtualization layer so that it appears as limitless disk is non trival.

    Sure, nothing is trivial when you’re developing software.

    > Buying a solution (probably a SAN solution, will increase your storage cost a lot).

    Hmm, not sure about that. 1TB storage for $1000, whether that’s 1TB for $1000 or 10TB for $10,000 (or less).

    > S3 is considerably more reliable than a single 1TB server.

    In theory perhaps, but that’s not what I’ve read. Check with Smugmug. Also, Amazon doesn’t even think S3 is reliable; they have no SLA.

    > Your single terabyte server is not backed up in multiple locations.

    So buy 3 or 4 more identical servers at $1000 each. It’s still cheaper to buy your own servers and colocate them.

    > If you compare apples to apple and built a S3, you would be hard pressed to do it for less.

    That’s what I keep hearing, but no one’s making the point with real numbers. Investing in technology based on speculation, without real numbers, doesn’t seem like the right way to do it. Unless your investors like the idea of working with Amazon and you like the exposure from TechCrunch.

  25. Andrew Erlichson

    Frank,

    Your 1TB server is not a fair comparison. Amazon mirrors your data in multiple data centers so you will need to consider having at least 2 data centers and consider the b/w between them. The problem is that you are comparing cheap backup (as in, i don’t really care if I lose my backup because its just backup) to Amazon’s triple redundant, physically distributed storage.

    our calculations are that you can build the storage yourself for less from a hw cost standpoint, even keeping copies in multiple data centers, but if you put even one head on the project fulltime to keep adding storage, fixing broken stuff, and scaling the system, the breakeven point is years out.

    Reliability is not just one number. I did not mean availability (where amazon had some issues). I meant that Amazon won’t lose your data.

    As for the SLA, we absolutely agree. It is needed. It is not really needed to use Amazon as backup, but it is certainly needed to make them primary storage. Feel free to email me at andrew@phanfare.com if you want more detailed cost numbers.

  26. John Byrne

    Don MaCaskill from Smugmug elaborates in detail about there experience
    with Amazon 3S…..he has nothing but good things to say (and $ave).

    http://blogs.smugmug.com/don/2.....the-money/