
Lately Twitter has been cleaning house, raising money, doing interviews and actually talking to users. In a blog post last week they did a Q&A session, directly answering questions about Twitter’s architecture.
So I have a couple of questions, too, based on a couple of discussions I’ve had with people who say they’ve seen Twitter’s architecture.
- Is it true that you only have a single master MySQL server running replication to two slaves, and the architecture doesn’t auto-switch to a hot backup when the master goes down?
- Do you really have a grand total of three physical database machines that are POWERING ALL OF TWITTER?
- Is it true that the only way you can keep Twitter alive is to have somebody sit there and watch it constantly, and then manually switch databases over and re-build when one of the slaves fail?
- Is that why most of your major outages can be traced to periods of time when former Chief Architect/server watcher Blaine Cook wasn’t there to sit and monitor the system?
- Given the record-beating outages Twitter saw in May after Cook was dismissed, is anyone there capable of keeping Twitter live?
- How long will it be until you are able to undo the damage Cook has caused to Twitter and the community?
Update: Twitter continues to be annoyingly and constructively responsive to criticism. They respond to this post here, saying “We’re working on a better architecture.” Kind of takes the air out of the balloon when you can’t get them riled up.


Wow. Let’s hope not!
In the Interview I did with Twitter executives Evan Williams and Biz Stone yesterday they said it would be several more months until Twitter’s problems are solved. http://scobleizer.com/2008/05/.....h-twitter/ — your questions are interesting, though.
Wow, are you kidding me…only 3 servers hosting their 1 database?! and no High Availability, no Clustering? Whoa!! What about load balancing on the web servers to handle the load? WTF?! Serious re-engineering ahead!! the twitter IT folks really need to roll up their sleeves and get busy!
BTW, I offered my services today to @twitter @EV @Biz today on twitter - no reply. I am seasoned Systems Engineer and Project Manager (Certified in both)… NO reply from twitter… they need all the help they can get and didn’t even reply to my Pro Bono offer. oh well.
Haven’t they received another 15 mil in funding? Where is the money going?
First.. amazing that so much is running on small MySql cluster. We have those setups here and I am glad to know we can scale up before we have to change architecture at CelleCast
Secondly, to get this big and be nursing it along is quite a shock
Interesting
What a bunch of noobie IT folks over at Twitter, not to mention another black eye for the amateurish RoR community.
Wow. Dude, I know you aren’t paid to be nice, but you have a serious asshole streak when it comes to Blaine Cook. Just remember that what goes around comes around.
@Robert Scoble, way to get that interview! This is a messy state to be in for twitter.
Good questions. In regards to the last one; six or seven months may be too long for the users to wait.
Friend Feed! Who thinks Friend Feed is the new twitter?
Susan: they covered this in the interview yesterday. They don’t want outside help. That makes sense. You might, indeed, be someone who could help them, but they don’t know that and increasing team size does bring a lot of inefficiencies that could decrease their speed greatly. They also covered the money issue. They have piles of it. But getting someone hired, and an effective part of the team, especially one that totally needs to rebuild systems, isn’t going to be something that solves these issues in hours. -
Sarah: FriendFeed is not a new Twitter. It’s a new FriendFeed. Listen to yesterday’s Gillmor Gang for a separate interview with FriendFeed’s founders, who took on the idea of cloning Twitter head on.
Wow Susan! Ev and Biz didn’t respond within 5 hours to you, a PM (certified)! Shocking! I can’t imagine anything better they might be doing right now more important than instantly responding to one of the 200,000 people who seem to know how to fix Twitter.
Is it true that a lawyer is trying to grill a free service about their technical architecture when he would not know the first thing to do to try and fix, let alone manage, this infrastructure?
Michael, you make it sound like a Congressional hearing. It’s not like whether Twitter is up or not is a matter of national security. (or is it??)
Why are questions - not answers - posted on a site that’s had a serious bone to pick with Twitter for weeks being treated as true by default?
I haven’t liked Twitter’s unreliability, and they shot themselves in the foot until very recently by keeping everyone in the dark on issues. Still, how about we attack them for confirmed things, not rumours?
Wow - why so personal? I’m sure Blaine & the team worked really hard and did the best they could with the massive, unexpected traffic growth. They built a great business. Scaling is hard. This type of personal attack I’d expect to read on Valleywag not Techcrunch.
I sure hope that is not true. That would be amazingly poor engineering.
Then again, maybe that happens when you don’t have revenue to provide the motivation of “we lose $x every minute the site is down”.
There is just no way that there is a single master and two slaves. That just can’t be.
Holy shit, if Arrington’s investigative reporting is true then RoR developers are bigger douchebags than I ever thought possible.
After all if the servers of 700 signal’s flagship products have to be restarted constantly to work (see: http://www.zedshaw.com/rants/r.....hetto.html) this platform is one giant FAIL of a choice.
Seriously, children. Just for l337 boasting rights you’ll choose an immature turd of a platform? You should not be allowed to have any decision making positions. 2 years of failure at the highest profile site implementing RoR and more to come.
Jeez. You’re acting like Twitter is a missle shield or something. It’s a social network. I’m being completely serious when I say that you may have some sort of addiction or disorder if you’re this angry about mild downtime of a social networking tool.
Please get help.
Everyone gets frustrated when a service they use goes down, but I think you’re being pretty harsh, Mike!
Also, to other commenters, don’t assume all of this is fact. Like only having 3 database servers…. I don’t believe that. Just not possible for such a high load, write heavy application to have so few database servers backing it.
Angry Fan: You do realize that (a) that quote was from like 2004 and (b) he was running on FastCGI, don’t you?
OK, we’ve rehashed the architecture, platform, engineering and technical components as they relate to scaling and downtime over and over and over. Am I the only one that sees this as a leadership issue? The IT / technical component can only be poorly implemented if the leaders / founders don’t build the right team and don’t make the right calls in regarding to how they are going to build out the service. If you accept that Blayne was part of the issue, then the founders are surely at fault for keeping him around for as long as they did….. I would like to see an open source version of twitter of sorts evolve and would like to see twitter have actual competition. Where are the twitter competitors? There IS room for competition in this space …….
Geez, why so agro? And why does everybody seem to assume that all of these statements are verified? Let’s see if they answer. Or go watch Scoble’s interview of the Twitter execs?
I agree with @Ben, I don’t see why you are specifically “blaming” Blaine?
I guess it’s always easier to blame the last guy - the guy that’s no longer there to answer his accusers.
I’m hoping Blaine responds to clarify.
Question of my own. Does anyone know if Twitter even has a test system?
Actually, if you proxy your database in many gigabytes of of memory cache, with various views into it, then really all you need to do to the database is tell it new information and delete old stuff.
It doesn’t help when the memory store gets corrupt (i.e. as in the past couple months) and then causes the db to almost literally explode.
“undo the damage”? no one will remember it in six months.
I think it is time we all recognized that “TC” stands for “TwitterCrunch” not “TechCrunch”…
People! Twitter is a cool idea that was badly implemented. That’s been well established in many contexts and admitted to on a number of occasions by folk that work at Twitter. It doesn’t do anyone any good to keep repeating the Twitter is screwed up. Heck, if it required Blaine Cook to keep the servers running, then all that does is confirm that it was a badly designed system and Twitter is doing the right thing by moving on from what was there when he was there. Why do you keep trying to embarrass Cook any more than he already is? It’s not useful. Also, why do you care about the number or type of databases they are using? For a messaging system like Twitter, the database should be largely irrelevant and only rarely accessed. By focusing on “the database” you only prove that you don’t understand this class of application. (No better than the people who think “Ruby” is the root of these issues.)
Look. It is well established that a service such as Twitter can be implemented reliably. We’ve even got a number of existence proofs. Of course, the one I’m most familiar with was PubSub.com which handled significantly more message traffic with larger messages and more complex matching to about the same number of distinct endpoints that Twitter serves. (We processed well over 2,000,000 messages per day scrapped from over 40 million blog feeds, where each message was an arbitrarily large blog post, and we routed them in real-time to over 1.5 million delivery queues defined by full text, complex boolean queries. Just like Twitter, we provided RSS/Atom and XMPP interfaces. Unlike Twitter, GrabPerf consistently showed us to be the fastest “search engine” on the web while we were running.) Others have provided examples of systems with similar or greater message volume as well as the same extreme fan-out that Twitter has. Twitter-like systems are not exotic or particularly hard to build… You just need to know what you’re doing.
Solutions to the Twitter problem are well know and commonly in use. It doesn’t do anyone good to keep pounding on these folk and trying to figure out “Whose fault is it?”. They’ve got system that needs to be fixed and everyone knows it. Pounding on them doesn’t do anyone any good.
Can we please convert “TwitterCrunch” back to “TechCrunch” and revisit all this in a few months?
bob wyman
Why beat this dead horse? We all know that their architecture needs work. What you’re doing now is just rubbing it in and slinging childish personal insults. Grow up.
Hey Arrington-
Would you please switch over to politics reporting? You could have prevented the Iraq war. I freaking love the harsh questions. Keep up the good work.
@Scoble - thanks for the feedback on outside help; that makes sense for sure (thinking of the “Mythical Man” book as I type this). They do need a highly skilled team that is really experienced at re-engineering web apps and db environments - so yes, speed of execution will definitely be an important consideration in team selection. I wish them the best. And twitter team wisely assessed the time element. It will easily take 6 months to re-architect, build, test and deploy twitter ver. 2.0
Hey Mike, stop crying like a little girl who can’t send her stupid tweets, grow up.
What’s funny is twitter that is about the simplest type of web app I can imagine, where the tweaks should be retrieved with single table db calls and heavy page caching. I kind of wonder why all the chaos?
I am so sick of hearing about Twitter.
This article would make you think the only problem is that they have an over-burdened database server. If that was it with the money they have they could run Twitter on a System Z to give them breathing room while they build out the team and re-architect the system to leverage multiple instances of commodity hardware.
Michael,
Did it ever occur to you that the reason Twitter is running so hot with only three DB’s or not enough servers, etc. isn’t actually Blaine’s fault? My “official” title at Digg is “Lead Architect” and I have NO AUTHORITY to buy, procure or install new servers. In fact, I don’t even have root on production boxes (many might argue this is a good thing).
–Joe
I agree with Mike Doeff. Let them work on it.
Why does every non-techie thinks that tech is so simple? It is not!!
Alex
all you armchair quarterbacks in here make me want to reach through the tubes of the internet and strangle you with my bare hands!!
you’re so lucky the tubes are clogged right now…
Wow, Micheal is being a real DB today. He blames the guy who brought twitter to the point where he gets serious withdrawal symptoms if it ever hiccups. Just wow.
No matter what language u use Michael (Ruby, Java, PHP, C++), just try writing at least 20,000+ records when Scoble posts. Every time. Along with every other twittard at the same time. If you know anything remotely about databases, you’ll know that this is a difficult proposition.
Fact is, this is one of those problems that have not being easily solved as yet. The problem of sending a message to tens of thousands of people in real time from a centralized database is not easy at all. It’s not just like email, since they have to be stored in the DB as well, and be hit my hundreds of thousands of queries through the API at the same time.
People quip that its a messaging problem. It really is not so much. People surmise that its an RoR problem, it definitely is not. People think Java will make it “magically” work. It certainly will not.
The problem is one of architecting twitter in a way so that when our buddies like Scoble, Arrington, Laporte and the Diggnation guys posts (not to mention a potential Paris Hilton!), it doesn’t have to make those massive writes to the database. Or a way is found to make all those writes across a equally balanced sharded database cluster. Or some other solution.
Anyone who makes something like Twitter is going to have the exact same problem.
hey mike - is it true that you are an asshole?
Can only agree to what Bob says - Twitter is a nice service but if they are not able to get it running in a short period of time, a new service will come up and take it’s place (pownce?). Doing instant messages is nothing new and maybe RoR just sucks and isn’t made for such huge traffic websites or their database design needs to be updated. I am sure there are good consultants out there who can help in this case so please stop crying …
And there are much more important things going on than twitter.
who cares ? Don’t like the way twitter is running ? then don’t use it. Do you think you can do better ? then build it. But pls, give us a break about twitter. it’s really not that important.
@39 ‘Ericson’ Smith: seriously, language is not an issue? you had a legitimate point about it being bad design to use a database as the source for the outgoing messages (FAIL). But, you must either be smoking crack to suggest that a Ruby based application handles the same throughput as a C/C++ compiled one, or simply yet another RoR douche. The speed of the application has a huge impact when you’re trying to do any sort of large scale real-time work, and choosing Ruby and RoR is just plain stupid for this task.
Twitter: Please give Mr. Arrington an exclusive… He is feeling left out.
@angry_fan
Dude, its not that big of an issue. Wake up… A database can only accept so many writes per second. If RoR can overwhelm it, can you imagine how much more your C++ app will overwhelm it?
Suppose your DB can sustain 1000 writes per second (that;s the limit) and RoR is throwing those 1000 per second at it. How does it improves things if your C++ app begins to throw 2000 per second at it? I think you see the problem now.
The problem is somehow making those writes parallel across multiple databases, or some other solution. Not increasing the flow to one or a few DB’s. Don’t be a DB like Mike was today.
A house of cards… Twitter to soon become a victim of its own success. If they have “piles of money”, yet have let this problem perpetuate itself to the point of absurdity, it certainly speaks volumes to the inexperience of leadership, as well as not having a strong enough customer centric approach to the business. It is 2008, there is no time, or excuse, for this degree of patchwork systems management. If a beast has been created that can’t function, then rethink the business model or SI. The clock is ticking… The Principals need to move out of the way and let someone else drive the bus before they destroy any remaining enterprise value.
Can anyone at Twitter spell ‘enterprise class’?
@angry_fan
Just to follow up, i’m not saying that C++ is not faster. Any fool knows that it is. And of course its great for the kind of real-time work, and we don’t know that they’re not already doing that for parts of the app that needs it.
C++ or Java or whatever compiled language would be great to use IF they can solve the underlying architectural challenges, but to just replace what they have now with Java or C++ without solving the base issues, it will just magnify the issue.
From what I know, the guys over at Twitter are not fools.
You know, I’ve been reading everywhere that Techcrunch has lost its purpose, and that specifically Michael’s public perception is going down the tubes because of attitudes he displays.
I have enjoyed reading Techcrunch for 2 years. However this post is the first that I’ve personally encountered where I have to say, “WTF Mr. Arrington?”
I live here in SF and enjoyed the free Iron Man preview at Metreon. Got a chance to meet a bunch of the people I follow and look up to in the industry, including Michael. You don’t seem to be a jerk in person, why are you getting so snarky here?
For someone who has interviewed hundreds of companies, are you getting so personally offended with Twitter because its an invaluable service to you? Dude, if that’s the case then put together a smart team of people and BUILD A NEW TWITTER.
You have the contacts with VC and talent. JUST DO IT.
@Ericson Smith:
I think we agree on most things except for this
The guys over at Twitter are fools - an app built around babysitting MySQL is badly designed.
Also, my point was that a compiled app acting as middleware should obviate the RDBMS. Does any IM system of scale use an RDBMS?
Twitter, Twitter, Twitter. Seriously need change.
I wouldn’t be shocked if Twitter answered you. Like you said, the one thing they are doing well these days is communicating with their users.
Twitter doesn’t need to fix and improve their architecture at the moment. The community is so big its too late for that. Now they need to create a complete new framework that can handle major load.
Oh my F*ing god that is why the down times during middle of months I say in their down report looked damn familiar as a pattern..
This is DBA incompetence on the highest order if all conclusions hold and I am sure they will
BTW, that is why they cannot handle track its Database intensive..basically the SQL query to do track approaches infinity thus queries have to be more than cached they have to be systematically optimized programmatically to get away from approaching infinity..
Your questions are funny Mike,
It would be like asking you, Mike Arrington.
Is it true that TechCrunch and all the crunch sites are actually a GPL word press install on a single server with a custom html template?
Is it true that Techcruch can’t handle so much as bad javascript as Seesmic pulled it down the other day?
This line of questioning is mundane and irrelevant to most people
As a technical expert and software programmer, even I do not see the point of them.
Twitter, aka ANY SITE is a package or product. Either it works or it doesn’t. Why would you want to go in and act like their sys admin?
If it were any site but twitter, I gather you would say it sucks in other words and that would be the end of it.
It’s almost like you and Scoble are trying to mold this site into your next Facebook or something. Trying to steer it in whatever direction you want.
I think that’s a little disturbing. I don’t think you would do this type of amateur technical consulting for any other website. It actually makes me question the relationship you have with them as well as Scoble’s.
@Angry Fan:
Um, yes, ejabberd, and most other OSS XMPP servers support mysql, postgresql, as well as possibly the other commerial RDBMS. Are you sure you know anything about them?
@55: Chris, I too question Arrington’s relationship with Scoble. I think it’s amazing that two narcissists can actually have sex with each other. Doesn’t that go against their prime urge of exalting their own selves above all else?
Plus, that’s one porn you don’t want to see. Though maybe you want to get a twattr from them when they start doing the monster mash together.
Well done Michael.
A handful of emotive words and you have 50 comments and counting.
@56: Mr. Church, it may be your dyslexia, but I said IM applications of scale. Not some pissant OSS XMPP douchebaggery.
Lordy, you’ve got problems with reading comprehension.
Heh, seems by the time I had read the comments you picked up another seven
Is that why most of your major outages can be traced to periods of time when former Chief Architect/server watcher Blaine Cook was there to sit and monitor the system?
Bing! I think you finally figured it out (or you’re getting good advice from a smart techie).
When did Twitter start getting unreliable? When Cook and Lee Mighdoll left. I don’t know which one was keeping it running or if both of them were, but the current situation has all the signs of people fumbling around trying to figure out which dial to turn when this or that happens.
This was a good post. The previous one was way off (and I said so).
Keep on truckin Mikey.
@Angry Fan:
I’m not sure how dyslexia would come into play? Maybe you could explain?
Many corporate and govenrment IT dept. are running customized versions of OSS XMPP servers. What are they using for DBs? What do you think google is using for a db? Multi-value dbs? I don’t claim to know, you do.
So, quit being a douchebag and tell us all how data storage is done on the “IM applications of scale” you seem to know everything about.
BTW, why don’t you write a post like this for your friend who runs Mahalo. You’re good at Emperor Has No Clothes pieces. We both know JC is running around buck nekkid. Why give him a free pass??
Michael,
I didn’t read the comments, but they should pony up and do some sort of co-sponsorship with Microsoft and throw the DBs onto MSSQL and run the servers on IIS. IronRuby is coming along nicely, although the RoR app might want to be upgraded to a more robust and tested framework/server OS, even if it isn’t IIS 7 and .Net
Does this mean CrunchBase can’t scale?
Ha ha, “damage to the community.”
You ever notice when the author of the post gets up on his high-horse that the posts follow suit. Everybody’s an expert! I offered pro-bono for bragging rights and they responded with NIH syndrome! RoR suxxxx0rz!!1
Hey, pile on everybody. Arrington, isn’t your training supposed to be in economics? Why not second guess their spending of the 15mil instead of blathering about IT? It makes you sound like Gillmor without the marijuana.
@Ericson Smith: you’ve said everything I was about to say.
No language “magically” recovers from database slave failures, so why would ruby?
I’m not sure what’s wrong with Michael Arrington recently, but his quality has defiantly gone down hill, every other post is about twitter.
I have a few questions too.
* Does your customer contract with Twitter state ‘zero downtime’? Oh wait…it’s a free service, no contract.
* What damage to Twitter and ‘the community’ are you talking about? They continue to grow rapidly and you continue to use it like a crack addict.
* Why do you continue to personally attack one of the guys who built it? Without his hard work it wouldn’t exist.
Funny, but a bit harsh?
RoR = a Joke.
OMG - Michael Arrington talking tech - mentioning things like MySQL replication and stuff. Doesn’t everyone know he doesn’t know what the f*ck he’s talking about!!! He’s a goddamn lawyer and a pseudo writer hack. We’re at 14 minutes now. Time is almost up.
Arrington talks a good game, but I think that’s all he’s really capable of doing. I doubt he has even the first clue as to what any of this stuff means.
@57,
I am in no way implying that Mr. Arrington and Mr. Scoble are having gay sex together. I am simply saying that I question their ties to Twitter.
Former attorney turned journalist turned amateur system administrator and DBA.
Jack of all trades, or somebody who is a little too interested in Twitter? This is not really a story in my humble opinion.
@Dave Winer: Well, let’s not get carried away. Arrington has been pushing Twitter with its lack of business model for two years, Facebook shits bricks of gold in his eyes, Mahalo is the bestest search engine evarr, and Yahoo MUST SELL now (each post collecting some M$$$, I guess).
The source of his critique isn’t to be a serious tech blogger, it has more to do with his tendency to throw tantrums like an overgrown baby. You should see what he does when he runs out of cigars. He’s like that giant baby-thing in Spirited Away.
their problem are understandable. ruby, mysql are only suitable for content oriented sites. they just made a mistake of using it instead of something more solid.
Time to move on from the regular twitter bashing - you get what you pay for and at the end of the day its a free service.
ha, they don’t even have a dba position open
@Greg: Wrong, just because a service is free does not mean that they have any less of a responsibility to their users. They are building a business, make no mistake about it.
Hey, here’s to redefining the word “Tweetard”.
Tweetard: “A person who constantly bashes the Twitter service, yet can’t stop using it.”
Hey, i’ll give an iPod nano to the first person who can verifiably get this up in urbandictionary.com
Contact me through my URL above!!
Twitter
Way to go armchair journalist. Feel good beating this dead horse? Instead of running your mouth at Twitter constantly, actually do something in technology. Have you ever dealt with the technical difficulties in something the scale of Twitter? Yeah, didn’t think so. Try it yourself, then you’ll have some respect for the people who fuel the service you spam daily for your personal marketing message. Tool.
What really impresses me about Michael is that he can keep up (and/or lead) in so many realms.
And that includes technical discussions.
It’s amazing.
Now (and not that he needs a defender in any way) but if Michael was a bit nicer to Scoble in print (sometimes he is nice) perhaps Robert would take him along on such interviews (and not that Michael doesn’t lead as well with his own interviews — and scoops - leading ability)
Michael, why go on a personal Jihad againt Cook? Also, do you evidence that he was dismissed????? (maybe you’ve already published that, I don’t know).
How long will it be until you are able to undo the damage Cook has caused to Twitter and the community?
I stopped reading your post too soon — that’s totally unfair.
@Ericson Smith - LOL - this one wins
I’m pretty confident in saying that Arrington has never once set foot inside any type of MySQL database, let alone setup any type replication/failover situation. There are much larger sites running on much less hardware, you don’t have a clue as to what you are talking about.