Previously: Hexing the technical interview.
In the formless days, long before the rise of the Church, all spells were woven of pure causality, all actions were permitted, and death was common. Many witches were disfigured by their magicks, found crumpled at the center of a circle of twisted, glass-eaten trees, and stones which burned unceasing in the pooling water; some disappeared entirely, or wandered along the ridgetops: feet never touching earth, breath never warming air.
As a child, you learned the story of Gullveig-then-Heiðr, reborn three times from the fires of her trial, who traveled the world performing seidr: the reading and re-weaving of the future. Her foretellings (and there were many) were famed—spoken of even by the völva-beyond-the-world—but it was her survival that changed history. Through the ecstatic trance of seidr, she foresaw her fate, and conquered death. Her spells would never crash, and she became a friend to outcast women: the predecessors of your kind. It is said that Odin himself learned immortality from her.
Previously: Reversing the technical interview.
Long ago, on Svalbard, when you were a young witch of forty-three, your mother took your unscarred wrists in her hands, and spoke:
Vidrun, born of the sea-wind through the spruce
Vidrun, green-tinged offshoot of my bough, joy and burden of my life
Vidrun, fierce and clever, may our clan’s wisdom be yours:
Never read Hacker News
If you want to get a job as a software witch, you’re going to have to pass a whiteboard interview. We all do them, as engineers–often as a part of our morning ritual, along with arranging a beautiful grid of xterms across the astral plane, and compulsively running ls in every nearby directory–just in case things have shifted during the night–the incorporeal equivalent of rummaging through that drawer in the back of the kitchen where we stash odd flanges, screwdrivers, and the strangely specific plastic bits: the accessories, those long-estranged black sheep of the families of our household appliances, their original purpose now forgotten, perhaps never known, but which we are bound to care for nonetheless. I’d like to walk you through a common interview question: reversing a linked list.
First, we need a linked list. Clear your workspace of unwanted xterms, sprinkle salt into the protective form of two parentheses, and recurse. Summon a list from the void.
(defn cons [h t] #(if % h t))
Last fall, I worked with CockroachDB to review and extend their Jepsen test suite. We found new bugs leading to serializability violations, improved documentation, and demonstrated documented behavior around nonlinearizable multi-key transactions. You can read the full analysis on jepsen.io.
This fall, I worked with MongoDB to design a new Jepsen test for MongoDB. We discovered design flaws in the v0 replication protocol, plus implementation bugs in the v1 protocol, both of which allowed for the loss of majority-committed updates. While the v0 protocol remains broken, patches for v1 are available in MongoDB 3.2.12 and 3.4.0, and now pass the expanded Jepsen test suite.
You can read the full analysis at jepsen.io.
In Herlihy and Wing’s seminal paper introducing linearizability, they mention an important advantage of this consistency model:
Unlike alternative correctness conditions such as sequential consistency  or serializability , linearizability is a local property: a system is linearizable if each individual object is linearizable.
Locality is important because it allows concurrent systems to be designed and constructed in a modular fashion; linearizable objects can be implemented, verified, and executed independently. A concurrent system based on a nonlocal correctness property must either rely on a centralized scheduler for all objects, or else satisfy additional constraints placed on objects to ensure that they follow compatible scheduling protocols.
This advantage is not shared by sequential consistency, or its multi-object cousin, serializability. This much, I knew–but Herlihy & Wing go on to mention, almost offhand, that strict serializability is also nonlocal!
These are unpolished thoughts. I started playing again for sources and to refine these ideas, but the game crashes so often that I’m giving up. Still think some folks might find this interesting. Spoilers everywhere.
In the opening, Davey notes that the CounterStrike level appears to be a desert town, but Coda has scattered these floating boxes and out-of-place, brightly-colored cubes in the level: a reminder that the game is not exactly what it purports to be. “Calling cards”, he calls them. A reminder that the game was created by a real person. “They are all going to give us access to their creator. I want to see past the games themselves. I want to know who the real person is.”
Jump up a level. We’re not just interested in learning about Coda as a person. We’re interested in understanding Davey as a person, too. What were the rough circumstances in Davey’s life? How did Coda’s work help? If each level tells us something about Coda, the purported author, then The Beginner’s Guide tells us something about Davey. And in another sense, the characters of Davey and Coda tells us something about the real Davey, and the people on his team. For instance, regardless of how we interpret the narrative’s characters, it’s likely safe to say that the real Davey is interested in questions of authorial intent.
In the last Jepsen post, we found that RethinkDB could lose data when a network partition occurred during cluster reconfiguration. In this analysis, we’ll show that although VoltDB 6.3 claims strict serializability, internal optimizations and bugs lead to stale reads, dirty reads, and even lost updates; fixes are now available in version 6.4. This work was funded by VoltDB, and conducted in accordance with the Jepsen ethics policy.
VoltDB is a distributed SQL database intended for high-throughput transactional workloads on datasets which fit entirely in memory. All data is stored in RAM, but backed by periodic disk snapshots and an on-disk recovery log for crash durability. Data is replicated to at least
k+1 nodes to tolerate
k failures. Tables may be replicated to every node for fast local reads, or sharded for linear storage scalability.
As an SQL database, VoltDB supports the usual ad-hoc SQL statements, with some caveats (e.g. no auto-increment, no foreign key constraints, etc.) However, its approach to multi-statement transactions is distinct: instead of
BEGIN ... COMMIT, VoltDB transactions are expressed as stored procedures, either in SQL or Java. Stored procedures must be deterministic across nodes (a constraint checked by hashing and comparing their resulting SQL statements), which allows VoltDB to pipeline transaction execution given a consensus on transaction order.
In the last Jepsen analysis, we saw that RethinkDB 2.2.3 could encounter spectacular failure modes due to cluster reconfiguration during a partition. In this analysis, we’ll talk about Crate, and find out just how many versions a row’s version identifies.
Because Elasticsearch has and continues to lose and corrupt data in response to network partitions and other faults, some might question whether Elasticsearch is appropriate for a primary data store. Crate’s co-founders knew about these hazards, and promised to publish fault-tolerance documentation in October 2014.
In the previous Jepsen analysis of RethinkDB, we tested single-document reads, writes, and conditional writes, under network partitions and process pauses. RethinkDB did not exhibit any nonlinearizable histories in those tests. However, testing with more aggressive failure modes, on both 2.1.5 and 2.2.3, has uncovered a subtle error in Rethink’s cluster membership system. This error can lead to stale reads, dirty reads, lost updates, node crashes, and table unavailability requiring an unsafe emergency repair. Versions 2.2.4 and 2.1.6, released last week, address this issue.
Until now, Jepsen tests have used a stable cluster membership throughout the test. We typically run the system being tested on five nodes, and although the network topology between the nodes may change, processes may crash and restart, and the system may elect new nodes as leaders, we do not introduce or remove nodes from the system while it is running. Thus far, we haven’t had to go that far to uncover concurrency errors.
Since RethinkDB passed its stable-membership partitioning tests, I offered the team a more aggressive failure model: we’d dynamically reconfigure the cluster membership during the test. This is a harder problem than consensus with fixed membership: both old and new nodes must gracefully agree on the membership change, ensure that both sets of nodes will agree on any operations performed during the handover, and finally transition to normal consensus on the new set of nodes. The delicate handoff of operations from old nodes to new provides ample opportunities for mistakes.