If you, as a database vendor, implement a few features in your API, I can probably offer repeatable automated tests of your DB’s partition tolerance through Jepsen.
The outcome of these tests would be a set of normalized metrics for each DB like “supports linearizability”, “available for writes when a majority partition exists”, “available for writes when no majority available”, “fraction of writes successful”, “fraction of writes denied”, “fraction of writes acked then lost”, “95th latency during condition X”, and so forth. I’m thinking this would be a single-page web site–a spreadsheet, really–making it easy to compare and contrast DBs and find one that fits your safety needs.
At a minimum, I need to know:
- After initial startup, when is the database stable and ready to accept writes?
- For a given key, which node (if any) is the primary replica?
- For a given key, which node (if any) are secondary replicas?
- After partitions end, when has the database fully recovered? (e.g. has it completed handoff, replayed oplogs, etc)
I also need totally automated, reliable scripting of DB installation and provisioning. Many DBs make it really tough to join nodes from the shell.
This is gonna take several months of my time and a nontrivial amount of money for hardware. I’m looking at a few options, from physical hardware in my garage to renting EC2 compute nodes. EC2 means anybody could, in theory, run these benchmarks themselves–but there are a ton of moving pieces involved, it takes a lot more work to set up, and VM performance is really variable. Ideally, someone out there has five or six identical boxes they don’t need any more–maybe leftover desktops, 1Us from a decommissioned colo, whatever. They don’t have to be all that fast, but I’m hitting the limits of what I can do on virtualized infrastructure.
If you want to make this happen, and can help make the necessary API improvements, write automation scripts for Jepsen, provide hardware or hosting, etc., please email aphyr@aphyr.com.
Post a Comment