Wednesday, July 2, 2008

What's Your Favorite Database Replication Feature?

Replication is one of the most flexible technologies available for databases. We are implementing a new open-source, database-neutral replication product that works with MySQL, Oracle, and PostgreSQL. Naturally we've done a lot of thinking about the feature set. It's tough to pick any single feature as the most important, but one that really stands out is optional statement replication. Here's why.

Database replication products tend to replicate row changes and DDL. However, Mark Callaghan has a great example of why you want to replicate statements as well--it enables Maatkit distributed consistency checking to work. If you dissect the mk-table-checksum --replicate command you will see that it uses a nice trick. The SQL queries generate checksums into the master table and then replicate as statements rather than row updates out to slaves. That way the slaves recompute the checksum locally at the same point in the overall transaction history. Very elegant!

Replicated consistency checks are a wonderful feature for large systems that can't afford to stop in order to compare tables between servers. However, you cannot use it if your database cannot replicate statements. As Mark points out, not even all MySQL engines do this. The proposed replication additions for PostgreSQL won't support it either.

Optional statement replication is really the best kind of feature: it is useful on its own, but also enables features like consistency checking and other nice administrative tricks. We're going to put a "worm-hole" in our replication engine that allows applications to invoke statement replication at the SQL level. Can you guess how we are going to do it? If not, you'll have to wait until we release. :)

So what's your favorite database replication feature?

5 comments:

Anonymous said...

I enjoy the "occasionally works" feature.

/troll

Anonymous said...

"We are implementing a new open-source, database-neutral replication product that works with MySQL, Oracle, and PostgreSQL."

Really? Does it do between Oracle -> PostgreSQL?

What is this software... I'm very interested!

Robert Hodges said...

Keep watching www.continuent.org. We are shooting for a release on 8 August. You can drop me an email at robert.hodges (at) continuent.com to find out more beforehand.

The heterogeneous Oracle->PG case is pretty interesting but the answer is "not quite yet." We may try to put in hooks in the form of call-outs that allow open source users to transform SQL, which would allow cross-database transfer. The design is in Java and is pretty modular so I think this will be reasonable to do.

JanL said...

Offline / distributed replication a lá Notes / CouchDB is pretty neat :)

Robert Hodges said...

CouchDB is very interesting, as the data model allows updates to run in any order. That's the key property required for lazy replication. You can do this in SQL, too, but you have to define the schema very carefully. You also need bi-directional replication, which is another on my favorite features list.

Scaling Databases Using Commodity Hardware and Shared-Nothing Design