May 15, 2008

Tungsten Scale-Out Stack Presentation from MySQL Conference

There have been a number of requests for copies of the slides to the Tungsten Scale-Out Stack talk I gave at the MySQL Conference in April. Here they are courtesy of the nice folks at O'Reilly who organized the conference.

Tungsten is our codename for a set of technologies to raise database performance and availability using scale-out. In the database world scale-out is a term of art that means spreading data across servers on multiple systems. With data in multiple places you are less subject to failures--when one copy crashes you just use the others. Similarly, if your application runs a lot of queries, you can spread them over different machines, which makes for faster and more stable response times.

So database scale-out sounds great (and is too), but getting it to work properly is harder than you would think. Along with practical issues like management, there are theoretical barriers. Let's say you are creating a product catalog service using database replicas on different hosts. Applications connect to any replica to get information. Your manager, a guy with pointed hair, tells you to make sure of the following:

1. The catalog service is always available.
2. The service keeps working even if you get a network partition between hosts.
3. The copies are always consistent (e.g., you can go to any copy and get the same data).

Here's an ugly surprise. It turns out your data service can only have two of the three properties at any given time, a result that was proven only recently and is now called the CAP Principle. If you want to be available and handle network partitions, you must accept that data will sometimes be inconsistent. Your manager is going to be very disappointed.

That's where we get back to Tungsten and the Scale-Out Stack. We realized a while back that you can't think in terms of a single product or even family of products to solve scale-out in a general way. It's better to design a flexible set of technologies with different strengths and weaknesses that users choose based on what's important to them. If you need to cluster over a WAN, use master/slave replication. If you don't want master failures, use synchronous replication in middleware.

Read the slides to learn more about the thinking. Database scale-out is a fascinating problem and we are looking forward to making it much easier to handle. Please stay tuned! I'll be writing more about this in the weeks and months to come.

No comments: