Do not panic. We accept late submissions for TechCrunch50, but please submit soon. »
Hey Twitter I Have A Few Questions Too
by Michael Arrington on May 31, 2008

Lately Twitter has been cleaning house, raising money, doing interviews and actually talking to users. In a blog post last week they did a Q&A session, directly answering questions about Twitter’s architecture.

So I have a couple of questions, too, based on a couple of discussions I’ve had with people who say they’ve seen Twitter’s architecture.

  • Is it true that you only have a single master MySQL server running replication to two slaves, and the architecture doesn’t auto-switch to a hot backup when the master goes down?
  • Do you really have a grand total of three physical database machines that are POWERING ALL OF TWITTER?
  • Is it true that the only way you can keep Twitter alive is to have somebody sit there and watch it constantly, and then manually switch databases over and re-build when one of the slaves fail?
  • Is that why most of your major outages can be traced to periods of time when former Chief Architect/server watcher Blaine Cook wasn’t there to sit and monitor the system?
  • Given the record-beating outages Twitter saw in May after Cook was dismissed, is anyone there capable of keeping Twitter live?
  • How long will it be until you are able to undo the damage Cook has caused to Twitter and the community?

Update: Twitter continues to be annoyingly and constructively responsive to criticism. They respond to this post here, saying “We’re working on a better architecture.” Kind of takes the air out of the balloon when you can’t get them riled up.

Responses

Comments rss icon

  • Wow. Let’s hope not!

  • In the Interview I did with Twitter executives Evan Williams and Biz Stone yesterday they said it would be several more months until Twitter’s problems are solved. http://scobleizer.com/2008/05/.....h-twitter/ — your questions are interesting, though.

  • Wow, are you kidding me…only 3 servers hosting their 1 database?! and no High Availability, no Clustering? Whoa!! What about load balancing on the web servers to handle the load? WTF?! Serious re-engineering ahead!! the twitter IT folks really need to roll up their sleeves and get busy!

  • BTW, I offered my services today to @twitter @EV @Biz today on twitter - no reply. I am seasoned Systems Engineer and Project Manager (Certified in both)… NO reply from twitter… they need all the help they can get and didn’t even reply to my Pro Bono offer. oh well.

  • Haven’t they received another 15 mil in funding? Where is the money going?

  • First.. amazing that so much is running on small MySql cluster. We have those setups here and I am glad to know we can scale up before we have to change architecture at CelleCast

    Secondly, to get this big and be nursing it along is quite a shock

  • What a bunch of noobie IT folks over at Twitter, not to mention another black eye for the amateurish RoR community.

  • Wow. Dude, I know you aren’t paid to be nice, but you have a serious asshole streak when it comes to Blaine Cook. Just remember that what goes around comes around.

  • @Robert Scoble, way to get that interview! This is a messy state to be in for twitter.

    Good questions. In regards to the last one; six or seven months may be too long for the users to wait.

    Friend Feed! Who thinks Friend Feed is the new twitter?

  • Susan: they covered this in the interview yesterday. They don’t want outside help. That makes sense. You might, indeed, be someone who could help them, but they don’t know that and increasing team size does bring a lot of inefficiencies that could decrease their speed greatly. They also covered the money issue. They have piles of it. But getting someone hired, and an effective part of the team, especially one that totally needs to rebuild systems, isn’t going to be something that solves these issues in hours. -

  • Sarah: FriendFeed is not a new Twitter. It’s a new FriendFeed. Listen to yesterday’s Gillmor Gang for a separate interview with FriendFeed’s founders, who took on the idea of cloning Twitter head on.

  • Wow Susan! Ev and Biz didn’t respond within 5 hours to you, a PM (certified)! Shocking! I can’t imagine anything better they might be doing right now more important than instantly responding to one of the 200,000 people who seem to know how to fix Twitter.

  • Is it true that a lawyer is trying to grill a free service about their technical architecture when he would not know the first thing to do to try and fix, let alone manage, this infrastructure?

  • Michael, you make it sound like a Congressional hearing. It’s not like whether Twitter is up or not is a matter of national security. (or is it??)

  • Why are questions - not answers - posted on a site that’s had a serious bone to pick with Twitter for weeks being treated as true by default?

    I haven’t liked Twitter’s unreliability, and they shot themselves in the foot until very recently by keeping everyone in the dark on issues. Still, how about we attack them for confirmed things, not rumours?

  • Wow - why so personal? I’m sure Blaine & the team worked really hard and did the best they could with the massive, unexpected traffic growth. They built a great business. Scaling is hard. This type of personal attack I’d expect to read on Valleywag not Techcrunch.

  • I sure hope that is not true. That would be amazingly poor engineering.

    Then again, maybe that happens when you don’t have revenue to provide the motivation of “we lose $x every minute the site is down”.

    There is just no way that there is a single master and two slaves. That just can’t be.

  • Holy shit, if Arrington’s investigative reporting is true then RoR developers are bigger douchebags than I ever thought possible.

    After all if the servers of 700 signal’s flagship products have to be restarted constantly to work (see: http://www.zedshaw.com/rants/r.....hetto.html) this platform is one giant FAIL of a choice.

    Seriously, children. Just for l337 boasting rights you’ll choose an immature turd of a platform? You should not be allowed to have any decision making positions. 2 years of failure at the highest profile site implementing RoR and more to come.

  • Jeez. You’re acting like Twitter is a missle shield or something. It’s a social network. I’m being completely serious when I say that you may have some sort of addiction or disorder if you’re this angry about mild downtime of a social networking tool.

    Please get help.

  • Everyone gets frustrated when a service they use goes down, but I think you’re being pretty harsh, Mike!

    Also, to other commenters, don’t assume all of this is fact. Like only having 3 database servers…. I don’t believe that. Just not possible for such a high load, write heavy application to have so few database servers backing it.

  • Angry Fan: You do realize that (a) that quote was from like 2004 and (b) he was running on FastCGI, don’t you?

  • OK, we’ve rehashed the architecture, platform, engineering and technical components as they relate to scaling and downtime over and over and over. Am I the only one that sees this as a leadership issue? The IT / technical component can only be poorly implemented if the leaders / founders don’t build the right team and don’t make the right calls in regarding to how they are going to build out the service. If you accept that Blayne was part of the issue, then the founders are surely at fault for keeping him around for as long as they did….. I would like to see an open source version of twitter of sorts evolve and would like to see twitter have actual competition. Where are the twitter competitors? There IS room for competition in this space …….

  • Geez, why so agro? And why does everybody seem to assume that all of these statements are verified? Let’s see if they answer. Or go watch Scoble’s interview of the Twitter execs?

  • I agree with @Ben, I don’t see why you are specifically “blaming” Blaine?

    I guess it’s always easier to blame the last guy - the guy that’s no longer there to answer his accusers.

    I’m hoping Blaine responds to clarify.

    Question of my own. Does anyone know if Twitter even has a test system?

  • Actually, if you proxy your database in many gigabytes of of memory cache, with various views into it, then really all you need to do to the database is tell it new information and delete old stuff.

    It doesn’t help when the memory store gets corrupt (i.e. as in the past couple months) and then causes the db to almost literally explode.

  • “undo the damage”? no one will remember it in six months.

  • I think it is time we all recognized that “TC” stands for “TwitterCrunch” not “TechCrunch”…

    People! Twitter is a cool idea that was badly implemented. That’s been well established in many contexts and admitted to on a number of occasions by folk that work at Twitter. It doesn’t do anyone any good to keep repeating the Twitter is screwed up. Heck, if it required Blaine Cook to keep the servers running, then all that does is confirm that it was a badly designed system and Twitter is doing the right thing by moving on from what was there when he was there. Why do you keep trying to embarrass Cook any more than he already is? It’s not useful. Also, why do you care about the number or type of databases they are using? For a messaging system like Twitter, the database should be largely irrelevant and only rarely accessed. By focusing on “the database” you only prove that you don’t understand this class of application. (No better than the people who think “Ruby” is the root of these issues.)

    Look. It is well established that a service such as Twitter can be implemented reliably. We’ve even got a number of existence proofs. Of course, the one I’m most familiar with was PubSub.com which handled significantly more message traffic with larger messages and more complex matching to about the same number of distinct endpoints that Twitter serves. (We processed well over 2,000,000 messages per day scrapped from over 40 million blog feeds, where each message was an arbitrarily large blog post, and we routed them in real-time to over 1.5 million delivery queues defined by full text, complex boolean queries. Just like Twitter, we provided RSS/Atom and XMPP interfaces. Unlike Twitter, GrabPerf consistently showed us to be the fastest “search engine” on the web while we were running.) Others have provided examples of systems with similar or greater message volume as well as the same extreme fan-out that Twitter has. Twitter-like systems are not exotic or particularly hard to build… You just need to know what you’re doing.

    Solutions to the Twitter problem are well know and commonly in use. It doesn’t do anyone good to keep pounding on these folk and trying to figure out “Whose fault is it?”. They’ve got system that needs to be fixed and everyone knows it. Pounding on them doesn’t do anyone any good.

    Can we please convert “TwitterCrunch” back to “TechCrunch” and revisit all this in a few months?

    bob wyman

  • Why beat this dead horse? We all know that their architecture needs work. What you’re doing now is just rubbing it in and slinging childish personal insults. Grow up.

  • Hey Arrington-

    Would you please switch over to politics reporting? You could have prevented the Iraq war. I freaking love the harsh questions. Keep up the good work.

  • @Scoble - thanks for the feedback on outside help; that makes sense for sure (thinking of the “Mythical Man” book as I type this). They do need a highly skilled team that is really experienced at re-engineering web apps and db environments - so yes, speed of execution will definitely be an important consideration in team selection. I wish them the best. And twitter team wisely assessed the time element. It will easily take 6 months to re-architect, build, test and deploy twitter ver. 2.0

  • Hey Mike, stop crying like a little girl who can’t send her stupid tweets, grow up.

  • What’s funny is twitter that is about the simplest type of web app I can imagine, where the tweaks should be retrieved with single table db calls and heavy page caching. I kind of wonder why all the chaos?

  • I am so sick of hearing about Twitter.

  • This article would make you think the only problem is that they have an over-burdened database server. If that was it with the money they have they could run Twitter on a System Z to give them breathing room while they build out the team and re-architect the system to leverage multiple instances of commodity hardware.

  • Michael,

    Did it ever occur to you that the reason Twitter is running so hot with only three DB’s or not enough servers, etc. isn’t actually Blaine’s fault? My “official” title at Digg is “Lead Architect” and I have NO AUTHORITY to buy, procure or install new servers. In fact, I don’t even have root on production boxes (many might argue this is a good thing). ;)

    –Joe

  • I agree with Mike Doeff. Let them work on it.

    Why does every non-techie thinks that tech is so simple? It is not!!

    Alex

  • all you armchair quarterbacks in here make me want to reach through the tubes of the internet and strangle you with my bare hands!!

    you’re so lucky the tubes are clogged right now…

  • Wow, Micheal is being a real DB today. He blames the guy who brought twitter to the point where he gets serious withdrawal symptoms if it ever hiccups. Just wow.

    No matter what language u use Michael (Ruby, Java, PHP, C++), just try writing at least 20,000+ records when Scoble posts. Every time. Along with every other twittard at the same time. If you know anything remotely about databases, you’ll know that this is a difficult proposition.

    Fact is, this is one of those problems that have not being easily solved as yet. The problem of sending a message to tens of thousands of people in real time from a centralized database is not easy at all. It’s not just like email, since they have to be stored in the DB as well, and be hit my hundreds of thousands of queries through the API at the same time.

    People quip that its a messaging problem. It really is not so much. People surmise that its an RoR problem, it definitely is not. People think Java will make it “magically” work. It certainly will not.

    The problem is one of architecting twitter in a way so that when our buddies like Scoble, Arrington, Laporte and the Diggnation guys posts (not to mention a potential Paris Hilton!), it doesn’t have to make those massive writes to the database. Or a way is found to make all those writes across a equally balanced sharded database cluster. Or some other solution.

    Anyone who makes something like Twitter is going to have the exact same problem.

  • hey mike - is it true that you are an asshole?

  • Can only agree to what Bob says - Twitter is a nice service but if they are not able to get it running in a short period of time, a new service will come up and take it’s place (pownce?). Doing instant messages is nothing new and maybe RoR just sucks and isn’t made for such huge traffic websites or their database design needs to be updated. I am sure there are good consultants out there who can help in this case so please stop crying …

    And there are much more important things going on than twitter.

  • who cares ? Don’t like the way twitter is running ? then don’t use it. Do you think you can do better ? then build it. But pls, give us a break about twitter. it’s really not that important.

  • @39 ‘Ericson’ Smith: seriously, language is not an issue? you had a legitimate point about it being bad design to use a database as the source for the outgoing messages (FAIL). But, you must either be smoking crack to suggest that a Ruby based application handles the same throughput as a C/C++ compiled one, or simply yet another RoR douche. The speed of the application has a huge impact when you’re trying to do any sort of large scale real-time work, and choosing Ruby and RoR is just plain stupid for this task.

  • Twitter: Please give Mr. Arrington an exclusive… He is feeling left out.

  • @angry_fan

    Dude, its not that big of an issue. Wake up… A database can only accept so many writes per second. If RoR can overwhelm it, can you imagine how much more your C++ app will overwhelm it?

    Suppose your DB can sustain 1000 writes per second (that;s the limit) and RoR is throwing those 1000 per second at it. How does it improves things if your C++ app begins to throw 2000 per second at it? I think you see the problem now.

    The problem is somehow making those writes parallel across multiple databases, or some other solution. Not increasing the flow to one or a few DB’s. Don’t be a DB like Mike was today.

  • A house of cards… Twitter to soon become a victim of its own success. If they have “piles of money”, yet have let this problem perpetuate itself to the point of absurdity, it certainly speaks volumes to the inexperience of leadership, as well as not having a strong enough customer centric approach to the business. It is 2008, there is no time, or excuse, for this degree of patchwork systems management. If a beast has been created that can’t function, then rethink the business model or SI. The clock is ticking… The Principals need to move out of the way and let someone else drive the bus before they destroy any remaining enterprise value.

  • Can anyone at Twitter spell ‘enterprise class’?

  • @angry_fan

    Just to follow up, i’m not saying that C++ is not faster. Any fool knows that it is. And of course its great for the kind of real-time work, and we don’t know that they’re not already doing that for parts of the app that needs it.

    C++ or Java or whatever compiled language would be great to use IF they can solve the underlying architectural challenges, but to just replace what they have now with Java or C++ without solving the base issues, it will just magnify the issue.

    From what I know, the guys over at Twitter are not fools.

  • You know, I’ve been reading everywhere that Techcrunch has lost its purpose, and that specifically Michael’s public perception is going down the tubes because of attitudes he displays.

    I have enjoyed reading Techcrunch for 2 years. However this post is the first that I’ve personally encountered where I have to say, “WTF Mr. Arrington?”

    I live here in SF and enjoyed the free Iron Man preview at Metreon. Got a chance to meet a bunch of the people I follow and look up to in the industry, including Michael. You don’t seem to be a jerk in person, why are you getting so snarky here?

    For someone who has interviewed hundreds of companies, are you getting so personally offended with Twitter because its an invaluable service to you? Dude, if that’s the case then put together a smart team of people and BUILD A NEW TWITTER.

    You have the contacts with VC and talent. JUST DO IT.

  • @Ericson Smith:

    I think we agree on most things except for this

    The guys over at Twitter are fools - an app built around babysitting MySQL is badly designed.

    Also, my point was that a compiled app acting as middleware should obviate the RDBMS. Does any IM system of scale use an RDBMS?

Leave Comment

Commenting Options

Enter your personal information to the left, or sign in with your Facebook account by clicking the button below.

Alternatively, you can create an avatar that will appear whenever you leave a comment on a Gravatar-enabled blog.

Trackback URL
bugbugbug
  • MediaTemple Logo
  • QuickSprout Logo
  • OpenX Logo
  • Cotendo Logo