It’s been a long time coming, but the computer industry is finally making headway in the switch from old-fashioned, platter based hard drives to the solid state drives (SSDs) found in iPhones and many other electronics. The benefits are obvious: unlike their platter-based counterparts, SSDs have no moving parts, and they’re also significantly faster for many tasks. Unfortunately most software, both server-side and otherwise, has been optimized for the older drives and the physical limitations that come with them. RethinkDB is a new startup that’s looking to capitalize on this problem by building a storage engine for mySQL databases that’s fully optimized for SSD drives, bringing with it large speed boosts and a number of features sure to catch the eye of many developers.
The company, which is part of the latest batch of Y Combinator-funded startups, is in fairly early stages (it started developing the product only two months ago), but it’s already making some substantial headway in the features it can offer. Among these are live schema changes, which allow developers to make significant modifications to their database structure without having to go through complex sync and backup procedures. It also offers lock-free concurrency, which means users will be able to read from the database even while other users are writing to it. And it’s an append-only database, which means developers can quickly recover in the event of a system failure.

RethinkDB is also taking a relatively novel approach to its development, at least as far as database storage is concerned. It’s following the “release early, release often” mantra, which it’s kicking off with the release of an early developer pre-alpha, which you can download and try out for free (the company says that implementing the software is quite easy because of the way MySQL handles storage engines). However, given that you’re going to be using this to manage your data, it is absolutely vital that you use this for testing purposes only — make sure you have any crucial data stored elsewhere. The company hopes to use developer input over the next few months to improve the product up until its release.
RethinkDB plans to have its commercial product out the door in the next six months, with an enterprise-level pricing structure that charges on a per-CPU basis (with support included).









Wow, that seriously looks amazing. Can’t wait for hosts to start offering SSD systems, then this will really be great to use.
Softlayer.com already offers 32 GB intel SSD for U$ 50/month. I think it’s not the most cost-effective solution yet.
At first glance at least, this seems genuinely impressive.
the db vs. filesystem wars continue…
Neat idea, but these academics are about to get a brutal lesson in the database sales industry.
Looks great guys, I wonder if some of the next gen managed service providers could use your offering as a way to differentiate their service than you would be in a sweet spot. That would be of course on top of some of the other revenue potentials for this product.
Anyways, seems amazing guys, keep up the good work.
can’t trust those russians with your sensitive database schemas… jk, good job executing slavka.
Kinkaid: It looks like we are equally knowledgeable on the raw technology here (i.e. not very
, but can you find out how its performance compares to InnoDB? My curiosity led me to wikipedia (to figure out wtf MyISAM was exactly) where the merits of RethinkDB sound comparable to innoDB.
I think comparable GPL’d software projects should always be compared when writing about a new SW tech company (esp. when they’re charging “per/CPU” FFS.
Straight proprietary vendors should also always be included as well, imo; and, for the most part they are in Crunchbase (but not always).
Kanner: Please fill *us* in here if you know
coldbrew: We tested against InnoDB a while back, but haven’t re-run the performance numbers recently. We’ve been working over the past two-ish weeks on features required by Wordpress so we could eat our own dogfood, and our next goal is to profile it and work on performance again. We’ll have solid graphs up soon.
Walsh: I appreciate the info. I was just trying to get a handle on this specific niche and what the landscape was like. If you detected any ribbing is toward Kincaid for seemingly not asking any questions or giving comparables.
I would argue that your business model is practical, but boring
, not to mention the back-end folks I know prefer Postgres, but it looks like the target market is blogs. But, the majority of blogs are on shared hosts, so the market is seemingly smal….As you can see, I’m quite confused about the “why should I care” aspects. When priced /CPU you are asking for your potential customers to say, “Show me the ROI?”
No ribbing detected. A great and valid post, thanks.
We’re still playing with the business model for now. It seems that one possibility is in real-time search.
All of our interesting work is MySQL API-independent, so a Postgres port is not out of the question. We’ve also been entertaining the idea of porting to SQLite, as many embedded devices use that, and have SSDs already.
Our main goal is to undo the unfortunate workarounds necessary for rotational drives, which are unnecessary on solid state, thereby becoming “The” database engine for SSDs. This is Why You Should Care. In terms of pricing, we are a long way from production-ready, so I wouldn’t worry about that quite yet. For the moment, you can download and help test our engine for free.
Interesting… and yes backend folks prefer postgresql… second if one wants speed they go for Vertica, if you draw up your pricing model make sure you offer more bang for the bucks…
This is good, but I think they are a little early. Timing is everything and the cost of SSD drives is too much higher to make the switch right now.
I say too early because it will be hard to sustain their business until the market is truly ready for their product. They may also run into some open source walls that may hinder their sales.
“some open source walls” — what does this exactly mean?
is there a plot comparing the new storage with innodb?
Interesting idea. Look forward to see how it shakes out.
You had me at MySQL, but I stayed for SDD.
On it’s face rethinkDB may have a winner depending which side the db v. filesystem argument favor.
This is really cool to see, especially from a y-combinator startup. Maybe its just my perception based on what gets covered in the media, but YC companies generally look more like trendy internet companies rather than real technology companies who could be game-changers.
Good luck to these guys!
MyISAM has always been the weakest storage engine for MySQL. I’d like to see InnoDB graph. No one I know uses MyISAM with MySQL. Everyone is using InnoDB.
What are they hiding?
To my knowledge, MyISAM is actually faster than InnoDB for selects.
We’ll show good performance numbers after we profile a bit; as stated above, we’ve been working on features as of late, and have let our performance slip a bit. As we know we can get performance back to competitive levels, we’re not trying to hide anything, we just don’t see the point in showing people the graphs until we’re satisfied that they’re where the analysis says they should be.
If you want to see what we’ve got anyway, feel free to email us and we’ll show you the current graphs.
No one serious uses MyIASM. Y!, FB, G, etc all use InnoDB for very good reasons. Like they want the data to still be there when they check back later.
for raw speed, myisam is way faster than innodb. Most people you know uses innodb for its data integrity guarantees, not for its speed. The more interesting test would be to compare rethinkdb on ssd versus myisam on a standard disk (i.e. not ssd).
We have those graphs too, and we blow them even further out of the water. It didn’t seem fair to handicap them with a rotational drive for the public graph..
This will be very interesting to play with and watch. I hope they follow the MySQL model and release a public source tree on launchpad, google code, or somewhere similar.
++
yes though what happens if some comes up with a better NV tech that get round some of the performance depredation over time problems.
Good idea. Will have to wait and see how it scales and the support from hosting companies for SSD storage.
Now this is HARD. At least it is a real tech company. Not some lame twitter startup that was made by some liberal arts script kiddy in a weekend.
Good luck to them for doing something difficult and useful. Hope they do well!
AFAIK, Bigtable and its inspired projects like Hypertable, HBase, Cassandra all use some variations of log structured storage scheme (esp. log structured merge tree) where commit logs are append only and index/data are immutable.
What’s the average size of elements for the graph? I had to make sure the time is not in milliseconds, as 100+ seconds to insert 1M record is not that impressive. Hypertable can do 1M+ inserts _per second_ on an 8 node cluster with commodity hardware (3×7200RPM drives) with data replicated 3-ways, checksumed and compressed.
Don’t know much about Bigtable or its derivatives. If you can send me something you recommend reading about it, that would be cool.
The elements inserted for that graph were rows of three integers each. I don’t doubt that Hypertable can beat us at this point, especially on a cluster, as we haven’t looked at clustering yet (right now we’re testing on one Shuttle box with a low-end SSD — anyone want to give us some hardware?
.
That graph sure looks convincing to me! can’t wait to try these SSD servers.. are they currently available?
Guys, here’s the fixes for free: http://peter.stardoll.com/
If only had they done it with Postgres. They lost me with MySQL…
I don’t think you’ve provided enough details about the graph (and you haven’t mentioned the source, and I can’t find it anywhere on http://www.rethinkdb.com), therefore it’s pretty meaningless.
What was the structure of the table?
Were there any indexes on the table. If so, what on?
What is your MySQL configuration?
Wow this is great! One of the few startups I’ve seen lately that actually looks like it’s doing something completely new and useful. I understand the way platter drives work to access data, but I never really considered that software would have been written to tailor to that process. Very, very interesting stuff.
I took one DB company public and am now running ScaleDB (a MySQL storage engine company focused on clustering). So I have a little background on the topic.
1. According to MySQL head of engineering it takes 5 years to build a storage engine (tuning, etc.), in my experience he is probably right. Raise more money.
2. The storage engine space is filled with dead bodies. About 100 efforts started, only about 6 real storage engines now.
3. SSD has the issue of fast clean write, slow dirty write, hence the append model. That chews-up expensive SSD space. SSD controllers are getting better about housecleaning to overcome these issues and all DBs will inherit that without too much optimization.
4. Not open source? Not an easy path, especially when InnoDB is the inhouse choice of MySQL (once acquisition is complete) and they have been tuned for 13 years.
5. DB is a tough market on the sales/marketing side. The old “build a better mousetrap…” is BS. We focus on shared-disk clusters (like Oracle RAC capability) because the standalone storage engine is a deadend. Even if you are much better than InnoDB (doubtful) people won’t pay for it.
6. Beware the legal issues with GPL (MySQL) if yours is commercial software.
7. Suggestion: go niche, maybe devices.
From the beginning of computing, computers have used a memory pyramid — registers, cache, main memory, virtual memory, and disk. What really matters is the speed at the top, the speed and size of the bottom, and the cleverness of the intermediate layers.
SSDs are expensive and over moderate to long periods of time, unstable. This makes them unsuitable for long term, persistent storage. Yes, they can be backed by rotating memory, but then they become just another layer in the memory hierarchy, not a technology that effects a sea change.
But if SSDs are suitable for long term persistent storage, a better intermediate layer is more RAM. It is much fast and more flexible than SSDs. If the bottom layer of the memory hierarchy is going to be rotating platters, virtually nothing is gained by SSDs.
SSDs are certainly useful in low power and mobile device. And they make for a dandy database demo. But for serious database work, I don’t think they have much to offer.
The key to future database development is to abandon the idea of database as a disk file, and adopt the architecture of “incomplete” in-memory databases using the disk for persistence and backfill.
Also note that a “commit” to another node on a Gigabit Ethernet is probably faster than a write to an SSD. Sun got it right when they invented the slogan “the network is the computer.”
Think RAM, cores, and network. Relegate disks to housekeeping.
SSDs are unstable because repeated erases wear out blocks. This is a problem mitigated by the FTL, but eventually, applications (that is, databases and filesystems) need to be re-written to prevent this kind of behavior. Our append-only approach does exactly that, so RethinkDB actually extends the life of your SSD.
The problem with using RAM as a storage layer is essentially volatility. SSDs can be used for their RAM-like asymptotics, but they take away the need to worry about durability (somewhat).
Isn’t this undermined by filesystems written for SSD that myisam/innodb can run ontop of. Seems a bit of a waste of energy to be so focused especially with all the newer mysql forks.
Running a B-tree-based database on flash doesn’t have the effect you seem to expect. Michael Bender’s talk from Tokutek about their experiments with InnoDB on the new FusionIO and Intel drives hilights this (http://blogs.to...cient_databases), as do our own experiments running MyISAM on solid state.
I wonder if there are any estimates on the life time of those SSDs. As I know, SSD starts loosing performance overtime after it was completely filled up. Every update is really delete and insert, where delete is just a change of the record status to be deleted. SSD uses free space until it can no longer, and only after that it starts clean those deleted records. Thus, even though initial performance is good, it may change significantly over time.
Great to see another innovative storage engine for MySQL. Sounds like this could be very interesting.
–Zack
Pretty slick. For me the lock-free concurrency is the biggie. I have some big tables with lots of inserts and table locking is the speed bottle neck.
Wish MySQL would build this in. I think that’s where most would get the biggest performance boost.
Leif,
You guys have certainly gotten some folks attention (good and bad). The ones who don’t like it obviously don’t know what they are talking about.
Call us this weekend. What do you want for your birthday?
Love,
Dad