Multi-Version Controlled Lock Violations for Snapshot Isolation
-
Table Of Contents
UNDONE: add back tags/series before publication
Introduction
In the world of concurrency control there are two approaches: pessimistic and optimistic concurrency control. Optimistic concurrency control assumes there are no conflicts and performs validation at the end of the transaction. While pessimistic concurrency control assumes there will be conflicts and detects those conflicts at the time of occurrence.
A benefit of detecting conflicts at the time of occurrence is transactions have flexibility in choosing what to do. The two well known options are waiting and aborting. There is also a third less discussed option, continuing execution. The transaction can speculate that the conflicting transaction will succeed and proceed as if it was committed.
The general approach to implementing pessimistic concurrency control is through locks. If one is unable to acquire the requested lock then there is a conflict. If the lock is granted, there are no conflicts and there will be no future conflicts. In the world of pessimistic concurrency control and speculative execution, there is the established approach of early lock release which I will discuss. And the more recent idea of controlled lock violations. I propose how one might extend traditional multi-version pessimistic snapshot isolation to support controlled lock violations.
Multi-Version Two-Phase Locking Snapshot Isolation
This section is a recap of well established ideas for implementing snapshot isolation with with pessimistic multi-version concurrency control. For the purposes of this post, multi-versioning is implemented as a version list of values that constitute a row. Additionally, it is assumed there is some sort of deadlock prevention/detection component (e.g. locks implemented as wait-die/wound-wait).
Combining multi-versioning with pessimistic concurrency control is very
effective for snapshot isolation. Read transactions do not block read-write
transactions. Reads work simply by doing a read of the latest value with
CommitTs <= StartTs
. No concurrency control is needed. Only read-write
transactions need concurrency control.1
For snapshot isolation, we need to detect and prevent write-write
conflicts. Write-write conflicts are when another transaction, Txn B
, writes
to the same row as Txn A
, such that StartTs_A < CommitTs_B < CommitTs_A
. In
plain english, Txn B
inserted a row that was not visible to Txn A
, so when
Txn A
tries to write to the row it will have a write-write conflict. In order
to implement this with pessimistic concurrency control, we only need exclusive
locks. Once Txn A
acquires an exclusive lock of the row, it is guaranteed that
no other transaction will commit a value with a commit timestamp less than Txn A
.
A read-write transaction will perform their reads as of the transaction's start
timestamp. As with read transactions, visibility is based on CommitTs <= StartTs
. Writes are performed as of the transaction's commit timestamp. There
is a detection phase and then a prevention phase for write-write
conflicts. First, the transaction determines if any of its writes conflict with
already committed writes. It asks, is there a committed value with CommitTs > StartTs
?
In Figure 1, the Txn B
encountered a value with CommitTs > StartTs
so it
fails with a write-write conflict.
Now, lets look at the happy case where Txn A
sees no write-write conflicts. It
acquires the row lock and inserts its uncommitted value into the version
list. Because it does not have a CommitTs
, no reads or read-write transactions
have visibility of the uncommitted data (except for Txn A
itself).
Then Txn B
comes and wants to perform a write. It can't because Txn A
owns
the lock, so it waits. It is not a write-write conflict because the value is
still uncommitted.
When Txn A
commits, all of the waiters now have a write-write conflict so they
abort. CommitTs > StartTs
for all waiters.
If Txn A
aborted instead, the first waiter, Txn B
, will now own the row lock
and can now insert its value into the version list and continue making progress.
Using PCC with multi-versioning eliminates the need for transaction private buffers for writes. Uncommitted values are able to be written in-place on the versioned list.
Arvola Chan, Stephen Fox, Wen-Te K. Lin, Anil Nori, and Daniel R. Ries. 1982. The implementation of an integrated concurrency control and recovery scheme. In Proceedings of the 1982 ACM SIGMOD international conference on Management of data (SIGMOD '82). Association for Computing Machinery, New York, NY, USA, 184–191. https://doi.org/10.1145/582353.582386
Strict Two-Phase Locking
To prevent dirty reads and cascading aborts with a simple implementation, strict two-phase locking can be used. Locks are acquired throughout the transaction, the growing phase. Locks are only released after the transaction is durable. The next section goes into the implications of releasing locks before the transaction is durable.
Early Lock Release
As one can surmise, it is beneficial to hold these locks for the shortest time period possible. The longer a lock is held, the longer conflicting transactions are blocked reducing concurrency. At the cost of more complexity, we can optimize how we hold locks after the growing phase to enable higher concurrency without dirty reads. There can be cascading aborts, albeit unlikely.
Because there is still a growing phase, the earliest we can release and still stay consistent with 2PL is after the commit request, no more locks will be acquired. This means locks can be released during the hardening stage. The hardening stage can be the dominant part of a transaction in OLTP workloads so enabling transaction progress during it can greatly increase performance during conflicts.
In figure 6 below, the blue line is the duration of a single node transaction. In strict 2PL the locks are held for the entire duration. With early lock release we only hold it for the green line as described above.

Releasing locks early can result in cascading aborts. However, in single node transactions we are doing the early lock release during the hardening stage. Generally, the only way a transaction aborts in this stage is if the database crashes. Therefore, cascading aborts do not matter. Waiting is strictly worse than speculatively proceeding, because if the database crashes the waiters would also abort.
A traditional early lock release implementation releases locks after determining its commit LSN. Assuming LSNs are strictly increasing, this creates a barrier. All transactions, acquiring the lock after the release will have a later commit LSN. Thus it satisfies the following:
- B can only commit after A commits.
However, this does not hold for read-only read-write transactions that do not
commit anything and thus do not acquire a commit LSN. Imagine Txn A
releasing
its locks, Txn B
returns those rows, and then the database crashes before Txn A
is durable. That is a dirty read. This is what MySQL InnoDB implemented for
early lock release. They released locks before the fsync. Therefore they
cautioned about dirty reads: B can return some of A's data, and then there is a
crash before A commits.2 However, InnoDB reverted early lock release because
even with that known limitation, it did not work with their crash recovery
protocol.3
Therefore we need a new rule:
- B can operate on A's data inside the DBMS, but it must not return any of A's data to the client until A commits.
This is the rule that prevents dirty reads. This check can be implemented as a tag in the lock that stores the commit LSN of the latest release. In a read-write transaction that only performed reads, it will wait for the commit LSN to become durable.3
However, the high watermark is not efficient if the log supports out of order
commits. Imagine the following scenario. Txn B
violated one of Txn A
's
locks. Txn A
is committed, but earlier in the log is a large Txn Z
that is
still committed. Txn B
can't commit because it needs to wait for the high
water mark to reach Txn A
.
Txn Z (slow, committing) | Txn A (committed) | Txn B (waiting for HWM)
With a high watermark implementation, it may be faster for Txn A
to not allow
the lock violation so Txn B
could immediately acquire the lock and commit. For
correctness, this also assumes the out of order log will crash if one of the
commits fails. Otherwise high watermark does not mean all transactions below are
committed.
Additionally, this simple implementation does not support efficient two-phase commits. In two-phase commit there is the prepare phase and then the commit phase. Using high watermark only allows violations during the hardening of the commit phase. We are holding the much longer than necessary (the green line instead of red line).

Therefore, we want a third rule to handle 2PC.
- B must abort if A aborts.
We are assuming here that 2PC aborts are rare because they can cause cascading aborts on failure. Unlike in the single node transaction, where waiters would also have been aborted (because of crash), waiters on the 2PC abort would still make progress. In the happy case though of 2PC succeeding, a 2PC transaction holds locks for the same amount of time as a 1PC transaction. There is still the overhead of a longer time to commit, but conflicting transactions are able to make progress in parallel.
If 2PC aborts have periods of high occurrence, the DBMS could dynamically switch from early lock release during the prepare phase to early lock release during the commit hardening phase, the green line. This would reduce concurrency but eliminate cascading aborts due to 2PC aborts.
Kimura, H., Graefe, G., & Kuno, H.A. (2012). Efficient Locking Techniques for Databases on Modern Hardware. ADMS@VLDB.
Controlled Lock Violation
Controlled lock violation is a slight reframing of early lock release. Instead of releasing the lock, the lock is still held and instead allows violations. One of the key benefits of controlled lock violation there is full visibility into the types of locks being violated. The paper focuses on the benefits this information has when using different lock types, hierarchical locking, and intention locks. There is a heavy focus on B-trees, and the paper's implementation is a boolean allow violation flag on a non-versioned B-tree.5
For our purposes, we are exploring how one can use the controlled lock violation ideas in the narrow context of multi-versioning and snapshot isolation. The benefits of having lock type information are unused because we only have a single lock type, exclusive locks.
An interesting observation is multi-versioned snapshot isolation as I talk about in this post does not really have the ability to do early lock release in the first place. The locks are embedded in the version list (through uncommitted values) and there is only a single type of lock, so it is impossible to "release" if a write is performed. The version list keeps the lock history.
Goetz Graefe, Mark Lillibridge, Harumi Kuno, Joseph Tucek, and Alistair Veitch. 2013. Controlled lock violation. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD '13). Association for Computing Machinery, New York, NY, USA, 85–96. https://doi.org/10.1145/2463676.2465325
Multi-Version Lock Violations
To support controlled lock violations, our version list needs to support the following:
- Multiple uncommitted values that are dependent on each other.
- Timestamps to determine uncommitted value visibility.
To do this, our versioned values include a new field, ViolationTs
. This is the
timestamp at which the locked uncommitted value allows violations. Violations
are allowed after the execution phase when the commit request is sent, which
keeps 2PL intact.
Read-only transactions are unchanged from the original design. They still determine
visibility using CommitTs
: the latest value with CommitTs <= StartTs
. This
is what ensures read-only transactions do not read uncommitted data and incur
commit dependencies.
Read-write transactions use ViolationTs
for visibility: the latest value with
ViolationTs <= StartTs
. Write-write conflicts are now defined using StartTs_A < ViolationTs_B < ViolationTs_A
. This is the reduced time period of potential
write-write conflicts that we want. It works because when ViolationTs_B <= StartTs_A
, A
has visibility of B
. Using ViolationTs
instead of CommitTs
means B
is possibly uncommitted, which is what the commit dependencies
track. Locks guarantee there is only a single possible future that is being
speculated and thus is visible.
Write Locking
Lets run through some scenarios to see how writes work in practice. The initial
state, seen in figure 8, has two committed values, and an uncommitted chain with
three values. As stated above, each versioned value has a ViolationTs
and a
CommitTs
(invalid values if not committed/allowing violations yet). Txn A
currently owns the exclusive lock on V2
(it was the transaction that inserted
V3
). Txn H StartTs < V3 ViolationTs
and Txn J StartTs < V3 ViolationTs
so
they can only read V2
. Therefore, they wait on V2
to be released
instead. V5
which owns Txn C
has not been released yet.
Figure 9 shows the outcome of Txn A
committing. Similar to strict 2PL snapshot
isolation, all the waiters abort with a write-write conflict. But, because we
speculatively made progress on the future based on Txn A
, we are able to
continue with the uncommitted list. That is the key component of controlled lock
violations. Additionally, V5
started allowing violations.
Next up GC was performed removing some of the lower committed versions to make
the graphic easier for the reader. You can see that Txn B
committed V4
. Txn D (StartTs = 7)
acquired the lock on V5
to insert V6
. Notice that the read
timestamp is less than V4 CommitTs=9
. This is why using ViolationTs
instead
of CommitTs
for visibility choices is important. The commit timestamp is
greater than the timestamps in the uncommitted chain. But our read-write
transactions need to have visibility into the uncommitted chain.
Finally a demonstration of a cascading abort. Txn C
aborts which also causes
Txn D
to abort. Because Txn K
was waiting on V4
which Txn C
originally
owned, it was able to acquire the exclusive lock and insert its own V5
.