CAP Theorem

Consistency → Every read receives the most recent write or an error
Availability → Every request receives a (non-error) response, without the guarantee that it contains the most recent write
Partition tolerance → The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes

Consistency Guarantees

Most replicated databases provide at least eventual consistency, which means that if you stop writing to the database and wait for some unspecified length of time, then eventually all read requests will return the same value. The inconsistency is temporary. It’s also called convergence. However, this is a very weak guarantee—it doesn’t say anything about when the replicas will converge. Until the time of convergence, reads could return anything or nothing.

For stranger guarantees, they are easy to understand and implement. But they may have worse performance or be less fault-tolerant than systems with weaker guarantees.

Distributed consistency is pretty similar to transaction isolation. But there are some difference: transaction isolation is primarily about avoiding race conditions due to concurrently executing transactions, whereas distributed consistency is mostly about coordinating the state of replicas in the face of delays and faults.

Linearizability

Linearizability (aka atomic consistency, strong consistency, immediate consistency, or external consistency) → make a system appear as if there were only one copy of the data, and all operations on it are atomic.

In a linearizable system, as soon as one client successfully completes a write, all clients reading from the database must be able to see the value just written. Maintaining the illusion of a single copy of the data means guaranteeing that the value read is the most recent, up-to-date value, and doesn’t come from a stale cache or replica.

What makes a system linearizable?

In a linearizable system, once a new value has been written or read, all subsequent reads see the value that was written, until it is overwritten again.

In this example, we imagine that there must be some point in time (between the start and end of the write operation) at which the value of x atomically flips from 0 to 1. Thus, if one client’s read returns the new value 1, all subsequent reads must also return the new value, even if the write operation has not yet completed. Client A is the first to read the new value, 1. Just after A’s read returns, B begins a new read. Since B’s read occurs strictly after A’s read, it must also return 1, even though the write by C is still ongoing.

A more precise definition for linearizability is: it is possible (though computationally expensive) to test whether a system’s behavior is linearizable by recording the timings of all requests and responses, and checking whether they can be arranged into a valid sequential order.

Linearizability Vs Serializability

Serializability → an isolation property of transaction, where every transaction may read and write multiple objects (rows, documents, records). It guarantees that transactions behave the same as if they had executed in some serial order. It is okay for that serial order to be different from the order in which transactions were actually run.

Linearizability → a recency guarantee on reads and writes of a register (an individual object). It doesn’t group operations together into transactions, so it does not prevent problems such as write skew.

A database that provides both serialiability and linearizability is called strict serializability.

Implementations of serializability based on 2PL or actual serial execution are typically linearizable.

Obviously the serializable snapshot isolation is not linearizable, because it uses snapshot which doesn’t include writes in other transactions that are more recent than the premise.

Relying on linarizability

When is the linearizability useful?

Locking and leader election
Constraints and uniqueness guarantees → generating auto-incremental primary key
Cross-channel timing dependencies

This is one example for cross-channel. Ideally, after the full-size image being stored in the storage system, we run the message queue task. However, if the storage process is slow, then we may run message queue task first, which tries to resize “existing” full-size image.

Implementation

The most common approach to making a system fault-tolerant is to use replication.

Single-leader replication (potentially linearizable)

Using the leader for reads relies on the assumption that you know for sure who the leader is. It is quite possible for a node to think that it is the leader, when in fact it is not—and if the delusional leader continues to serve requests, it is likely to violate linearizability. With asynchronous replication, failover may even lose committed writes, which violates both durability and linearizability.

Multi-leader replication (not linearizable)

Systems with multi-leader replication are generally not linearizable, because they concurrently process writes on multiple nodes and asynchronously replicate them to other nodes. For this reason, they can produce conflicting writes that require resolution. Such conflicts are an artifact of the lack of a single copy of the data.

Leaderless replication (probably not linearizable)

For systems with leaderless replication, people sometimes claim that you can obtain "strong consistency" by requiring quorum reads and writes (w + r > n).

"Last write wins" conflict resolution methods based on time-of-day clocks are almost certainly nonlinearizable, because clock timestamps cannot be guaranteed to be consistent with actual event ordering due to clock skew. Sloppy quorums also ruin any chance of linearizability. Even with strict quorums, nonlinearizable behavior is possible.

Why quorums cannot provide linearizability?

According to the figure, n = 3, w = 3, r = 2, which meet the quorum requirement. However, this execution is nevertheless not linearizable: B’s request begins after A’s request completes, but B returns the old value.

Consensus algorithms (linearizable)

Some consensus algorithms bear a resemblance to single-leader replication. However, consensus protocols contain measures to prevent split brain and stale replicas. Thanks to these details, consensus algorithms can implement linearizable storage safely.

Cost of linearizability

A network interruption forcing a choice between linearizability and availability.

Imagine that we use single-leader replication, and the network partition split leader with some followers.
Because all writes happens in leader, so if we continue to serve client, then some clients that connect to isolated followers will get out-of-date value, which violate the linearizability.
On the other hand, if we choose to let the reads wait until the network restart working, then we lose availability.

“Either Consistent or Available when Partitioned”

CAP is sometimes presented as Consistency, Availability, Partition tolerance: pick 2 out of 3. Actually, it’s wrong!

At times when the network is working correctly, a system can provide both consistency (linearizability) and total availability. When a network fault occurs, you have to choose between either linearizability or total availability.