Serializability, linearizability, and locality

2016-08-15

In Herlihy and Wing’s seminal paper introducing linearizability, they mention an important advantage of this consistency model:

Unlike alternative correctness conditions such as sequential consistency [31] or serializability [40], linearizability is a local property: a system is linearizable if each individual object is linearizable.

Locality is important because it allows concurrent systems to be designed and constructed in a modular fashion; linearizable objects can be implemented, verified, and executed independently. A concurrent system based on a nonlocal correctness property must either rely on a centralized scheduler for all objects, or else satisfy additional constraints placed on objects to ensure that they follow compatible scheduling protocols.

This advantage is not shared by sequential consistency, or its multi-object cousin, serializability. This much, I knew–but Herlihy & Wing go on to mention, almost offhand, that strict serializability is also nonlocal!

Recall that strict serializability is essentially serializability plus linearizability’s real-time constraint: transactions cannot be arbitrarily re-ordered, but must appear to take place atomically at some time between their invocation and completion. When we add real-time constraints to sequential consistency, we get linearizability: a local property. Why can’t we add real-time constraints to serializability and obtain locality? Why don’t real-time multi-object transactions compose?

In the time-honored fashion of space-constrained academics, Herlihy and Wing present a beautifully-constructed history, affectionately named H₈, as obvious evidence that sequential, serializable, and strict serializability all fail to provide locality. I’ve been staring at this history all morning, and would like to share it with you.

Subhistories

Consider a FIFO queue x, in which objects are dequeued (deq) in the same order they are enqueued (enq). Two processes, A and B, enqueue and dequeue numbers to and from this queue. Each operation completes immediately, before the next operation begins.

process  f     queue  value
B        enq   x      2
A        enq   x      1
B        deq   x      1

This history is sequential: we can reorder it to (A enq x 1) (B enq x 2) (B deq x 1), which is a legal execution for a queue, without changing the order of any single process’ operations. The same order proves this history is also serializable, if we take each operation as a separate transaction. It is not, however, strict serializable: every operation completed immediately, so we may not reorder them.

What if each process’ operations took place in a single transaction: [(A enq x 1)], and [(B enq x 2) (B deq x 1)]? These transactions overlap in time, which means a strict serializable system is free to order them either way. [A B] is legal, so this system is strict serializable (and therefore serializable as well).

By swapping A with B and 1 with 2, we can construct a second history, on a different queue y, with the same properties:

process  f     queue  value
A        enq   y      1
B        enq   y      2
A        deq   y      2

So: both of these histories are independently sequential, serializable, and strict serializable, if we choose our transaction boundaries carefully. Now, let us consider their composition.

Combined history

We can interleave the two histories to produce a single history, in which each key is independently sequential, serializable, and strict serializable (again, given our particular choice of transactions).

process  f     queue  value
A        enq   y      1
B        enq   x      2
A        enq   x      1
B        enq   y      2
A        deq   y      2
B        deq   x      1

The combined history is no longer sequential. In order for A to (deq y 2), B must (enq y 2) before A begins–since A’s first action is to (enq y 1). Therefore (B enq x 2) also precedes A. Therefore x must begin with 2, which prevents B from dequeuing 1: a contradiction.

If we take each operation as a separate transaction, we can easily construct a serializable history: just move the conflicting enqueues after the dequeues. However, this history is not strictly serializable, because immediate returns prevent any reordering.

Curiously, our multi-op transactions from the subhistories satisfy strict serializability, even in the composite history: we can move the single-enqueue transactions to the beginning without violating their real-time order.

(A enq x 1)
(B enq y 2)
(A enq y 1) (A deq y 2)
(B enq x 2) (B deq x 1)

However, the system of multiple objects allows us to construct new transactions. For instance, each process could perform a single transaction:

(A enq y 1) (A enq x 1) (A deq y 2)
(B enq x 2) (B enq y 2) (B deq x 1)

There is no way to order these two transactions: A can’t run first, because it needs to dequeue 2, which comes from B–and vice-versa, for B. These transactions do not satisfy serializability.

Interpretation

We often speak of locality as a property of subhistories for a particular object x: “H|x is strictly serializable, but H is not”. This is a strange thing to say indeed, because the transactions in H may not meaningfully exist in H|x. What does it mean to run [(A enq y 1) (A enq x 1)] on x alone? If we restrict ourselves to those transactions that do apply to a single object, we find that those transactions still serialize in the full history.

So in a sense, locality is about the scope of legal operations. If we take single enqueue and dequeue operations over two queues x and y, the space of operations on the composite system of x and y is just the union of operations on x and those on y. Linearizability can also encompass transactions, so long as they are restricted to a single system at a time. Our single-key, multi-operation transactions still satisfied strict serializability even in the composite system. However, the space of transactions on a composite system is more than the union of transactions on each system independently. It’s their product.

We can view strict serializability as linearizability plus the multi-object transactions of serializability. But in another sense, linearizability is strict serializability with the constraint that transactions are constrained to act on a single object, because that restriction provides locality. That object could be a single register or queue. It could be a set or map. It could be a table in a database, or an entire database. The scope of an object determines what operations we can perform when we glue two systems together: every key in a key-value store may be independently linearizable, but we cannot perform a read of two keys together in a linearizable fashion.

From an implementer’s perspective, locality is desirable: it means we can have an independent coordinator per system and still have all the systems comprise a linearizable whole. But from a users perspective, I have a hunch that locality is… almost tautological: it’s just the scope of objects you can use together. Just like you can’t glue two strict serializable systems together, you can’t set two linearizable systems side by side and use them together in a linearizable way. As if designing a linearizable system from scratch, we have to sit down, look at our transactions carefully, and prove that our algorithm prevents nonlinearizable histories. Linearizable systems may be “compositional”, but composing them remains nontrivial.

At least, that’s my read on things right now. I’m hoping someone with a better grasp of these papers can put me straight.

Pigeonhunter on 2016-08-30

Whoa there! The first and second sub-histories aren’t sequential. They can’t be reordered as described. How does “B” deque “1” when the value “2” is at the head of the queue?

Aphyr on 2016-08-31

The reordering I state above preserves process order, because all of process A’s operations happen in order, and all of process B’s operations happen in order. That reordering is legal because we enqueue 1, enqueue 2, then dequeue 1, which is consistent with FIFO queue semantics. Because there exists a legal reordering that preserves process order, the history is sequentially consistent.

Pigeonhunter on 2016-08-31

Look closely at the first sub-history, and it’s so called legal reordering. (B enq x 2) (A enq x 1) (B deq x 1) == lifo queue (stack) semantics (A enq x 1) (B enq x 2) (B deq x 1) == fifo queue semantics

If we apply fifo semantics to both we get. (B enq x 2) (A enq x 1) (B deq x 2), and (A enq x 1) (B enq x 2) (B deq x 1)

Still sequentially consistent?

Aphyr on 2016-09-01

I’m not sure how else to explain this–it follows directly from the definition of sequential consistency. See Lamport, 1979.

Pigeonhunter on 2016-09-02

Either I’m missing something fundamental, or I’m not using my words effectively. So, I’ll try again.

I understand that history to be considered sequentially consistent, the ordering of the operations within the individual processes must not be violated by the interleaving. That is true in the example.

The second requirement is that the system must obey FIFO semantics, and this is where i’m confused. Yes, the “reordering” is valid, but the original history is not. For (B deq x 1), the value “1” must be the first element in the queue, but “2” is enqueued first. This invalidity is maintained when the second history is constructed by “swapping A with B and 1 with 2”.

Substituting the valid (B deq x 2) into the original subhistory and turning the crank yields:

process  f     queue  value
A        enq   y      1
B        enq   x      2
A        enq   x      1
B        enq   y      2
A        deq   y      1
B        deq   x      2

Aphyr on 2016-09-02

Again, sequential consistency does not require that process A’s operations fall in any particular order relative to process B’s operations. Please read the Lamport paper. Or the linearizability paper. Or use Wikipedia, or Google “sequential consistency”, or see https://aphyr.com/posts/313-strong-consistency-models. If you still believe that Herlihy and Wing are incorrect, I encourage you to submit a short monograph to the ACM, who would no doubt be interested in publishing a correction.

Matthew Sackman on 2016-09-02

@Pigeonhunter My understanding is as follows:

After the two enqs, there is no known head of the queue: the enqs do not observe the state of the queue so there is no requirement to evaluate which of the enqs actually happened first (this also requires that each enq comes from a different agent, which they do in this case). What really isn’t clear is how this history has been assembled: it’s clearly more “strict” than just a concatenation of the different local histories of the agents (if it were that loose, then you could have the deq appearing before the corresponding enq), but it’s also not the exact history of the queue in question, should the queue be some sort of single-threaded thing (well, it could be if the queue were implemented in some perverse way which deliberately only adhered to logically required dependencies…). So in general, I much prefer to draw dependency graphs for this sort of thing, and not just list events in some order.

Once a deq happens, we can determine which enq must have ‘happened first’ from the perspective of the queue.

To draw the graphs (which I find very useful), all we need to look at is the dependencies between the different operations: actions by the same agent must happen in the order they appear in the local history of the agent, and we also have that a deq revealing some v must ‘happen-after’ an enq of that v in the global history. I believe those are the only dependencies we need to worry about here. So if you construct a graph with those dependencies as edges between the events then any traversal through the graph which starts from any of the roots and which doesn’t emit a node until all the node’s ‘happens-before’ nodes have been emitted, is a valid global history.

FWIW, I could not understand the problem with the “combined history” example until I drew out such a graph - drawing it out reveals that the graph contains a cycle, which is illegal in these history graphs.

Greg on 2016-10-24

@Pigeonhunter I think I see what’s going on, potentially; still working my way through it.

The very first presented history is not in a legal order. It’s not clear to me why that is or from what frame of reference the history is. Perhaps it’s a logging system that has maintained action order at the process level but doesn’t have any way of knowing the exact order between processes. Certainly it’s not the history according to the queue? Anyway, Kyle made a legal history by reordering the processes actions in relation to each other while keeping actions within each process in the same order.

Aphyr on 2016-10-25

Greg, the history is exactly as presented–no tricks here. Each operation completes before the next takes place. Histories are only legal or illegal in the context of a particular consistency model; the remainder of the argument demonstrates under what models that history can be considered legal.

Jingguo Yao on 2016-11-22

@Pigeonhunter, @Greg The first history is sequential consistency. It adheres to the definition of sequential consistency given in Section 3.4 of The Art of Multiprocessor Programming, Revised Reprint 1st Edition. And the history is very similar to the history given in Figure 3.7 on page 53. The first author of https://cs.brown.edu/~mph/HerlihyW90/p463-herlihy.pdf is also the first author of the book.

msteffen on 2017-11-07

I apologize if this is a dumb question, and it’s only tangentially related to the post at hand, but it’s an issue I’ve been stuck on in the original paper myself. Herlihy and Wing define a sequential history as one satisfying two conditions:

First event of H is an invocation
Each invocation, except possibly the last, is immediately followed by a matching response. Each response is immediately followed by a matching invocation.

Since an invocation and response match iff they have the same object and process, doesn’t “each response is immediately followed by a matching invocation”

imply that all events in a sequential history share the same object and process?
prohibit (A enq x 1) (B enq x 2) (B deq x 1), since the response to (A enq x 1) does not match the following invocation (B enq x 2)?

Aphyr on 2017-11-08

Ah, yes, this is a tad confusing. Section 2.1 defines the formalism for linearizability, and as a part of that formalism, they define a sequential history in those terms. This doesn’t mean they share the same object and process, it just means there’s no concurrency.

This is a different sense of “sequential” than Lamport’s sequential consistency, discussed in 3.3. Sequential consistency, in that sense, means that the order of operations by any particular process is preserved.

phylips on 2018-03-29

I read Lamport’s classic paper about sequential consistency “How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs” and have a question

Consider this program：

init state: a =0 b= 0 c=1

process 1: a = b + c

process 2: b = a + c

if they executed in this sequential order: a = b + c b = a + c we will get a = 1, b =2

if in another order b = a + c a = b + c we will get a = 2, b =1

both of them, at least one of a,b will be 2 But according to the paper, even R1 & R2 are all meet,

we may get the result: a = 1 b =1, conflicts with above sequential result

the detail execution as follows:

1.process1 fetch b, get 0 2.process1 fetch c, get 1 3.process2 fetch a, get 0 4.process2 fetch c, get 1 5.process 1 compute a = b+c = 1 6.process 2 comput b = a+c =1 7.process 1 store a=1 8.process 2 store b = 1

in this execution, both process 1 and process2 1).issues memory request in the order specified by its program 2).the memory request are serviced in FIFO mode

but we get: a = 1, b = 1

Do you know how understand this example？

Kevin on 2018-06-27

@msteffen: I had the same question, and did some digging.

In the initial footnote of Linearizability, they reference [an earlier version] of the paper which defines sequential history as follows:

A history H is sequential if: 
  1. The first event of H is an invocation.
  2. Each invocation, except possibly the last, is immediately followed by a matching response. 
  3. Each response, except possibly the last, is immediately followed by an invocation.

Note that (3) does NOT require the response/request to match.

Aphyr on 2018-06-28

Note that this definition of “sequential history” is essentially defining a non-concurrent history, which is a very different thing from a “sequentially consistent history”! It’s all very confusing, isn’t it? :-O

Serializability, linearizability, and locality

Subhistories

Combined history

Interpretation

Post a Comment