UNDONE: add back tags/series before publication

Introduction

In the world of concurrency control there are two approaches: pessimistic and optimistic concurrency control. Optimistic concurrency control assumes there are no conflicts and performs validation at the end of the transaction. While pessimistic concurrency control assumes there will be conflicts and detects those conflicts at the time of occurrence.

A benefit of detecting conflicts at the time of occurrence is transactions have flexibility in choosing what to do. The two well known options are waiting and aborting. There is also a third less discussed option, continuing execution. The transaction can speculate that the conflicting transaction will succeed and proceed as if it was committed.

The general approach to implementing pessimistic concurrency control is through locks. If one is unable to acquire the requested lock then there is a conflict. If the lock is granted, there are no conflicts and there will be no future conflicts. In the world of pessimistic concurrency control and speculative execution, there is the established approach of early lock release which I will discuss. And the more recent idea of controlled lock violations. I propose how one might extend traditional multi-version pessimistic snapshot isolation to support controlled lock violations.

Multi-Version Two-Phase Locking Snapshot Isolation

This section is a recap of well established ideas for implementing snapshot isolation with with pessimistic multi-version concurrency control. For the purposes of this post, multi-versioning is implemented as a version list of values that constitute a row. Additionally, it is assumed there is some sort of deadlock prevention/detection component (e.g. locks implemented as wait-die/wound-wait).

Combining multi-versioning with pessimistic concurrency control is very effective for snapshot isolation. Read transactions do not block read-write transactions. Reads work simply by doing a read of the latest value with CommitTs <= StartTs. No concurrency control is needed. Only read-write transactions need concurrency control.1

For snapshot isolation, we need to detect and prevent write-write conflicts. Write-write conflicts are when another transaction, Txn B, writes to the same row as Txn A, such that StartTs_A < CommitTs_B < CommitTs_A. In plain english, Txn B inserted a row that was not visible to Txn A, so when Txn A tries to write to the row it will have a write-write conflict. In order to implement this with pessimistic concurrency control, we only need exclusive locks. Once Txn A acquires an exclusive lock of the row, it is guaranteed that no other transaction will commit a value with a commit timestamp less than Txn A.

A read-write transaction will perform their reads as of the transaction's start timestamp. As with read transactions, visibility is based on CommitTs <= StartTs. Writes are performed as of the transaction's commit timestamp. There is a detection phase and then a prevention phase for write-write conflicts. First, the transaction determines if any of its writes conflict with already committed writes. It asks, is there a committed value with CommitTs > StartTs?

In Figure 1, the Txn B encountered a value with CommitTs > StartTs so it fails with a write-write conflict.

Write-Write
Write-Write
Txn A
StartTs: 1
Txn A...
Versioned List
Versioned List
Committed
Committed
V2
CommitTs: 2
V2...
V1
CommitTs: 1
V1...
Text is not SVG - cannot display
Figure 1. Write-Write Conflict.

Now, lets look at the happy case where Txn A sees no write-write conflicts. It acquires the row lock and inserts its uncommitted value into the version list. Because it does not have a CommitTs, no reads or read-write transactions have visibility of the uncommitted data (except for Txn A itself).

Versioned List
Versioned List
Committed
Committed
V2
CommitTs: 2
Txn A (StartTs:
2)
V2...
V1
CommitTs: 1
V1...
Uncommitted
Uncommitted
V3
V3
Text is not SVG - cannot display
Figure 2. Row lock acquisition.

Then Txn B comes and wants to perform a write. It can't because Txn A owns the lock, so it waits. It is not a write-write conflict because the value is still uncommitted.

Versioned List
Versioned List
Committed
Committed
V2
CommitTs: 2
Txn A (StartTs:
2)
V2...
V1
CommitTs: 1
V1...
Uncommitted
Uncommitted
V3
V3
Waiters
Waiters
Txn B
StartTs: 2
Txn B...
Text is not SVG - cannot display
Figure 3. Row lock waiter.

When Txn A commits, all of the waiters now have a write-write conflict so they abort. CommitTs > StartTs for all waiters.

Versioned List
Versioned List
Committed
Committed
V2
CommitTs: 2
V2...
V1
CommitTs: 1
V1...
V3
CommitTs: 3
V3...
Write-Write Conflict
Write-Write Conflict
Txn B
StartTs: 2
Txn B...
Text is not SVG - cannot display
Figure 4. Transaction commit.

If Txn A aborted instead, the first waiter, Txn B, will now own the row lock and can now insert its value into the version list and continue making progress.

Aborted
Aborted
V3
V3
Versioned List
Versioned List
Committed
Committed
V2
CommitTs: 2
Txn B (StartTs:
2)
V2...
V1
CommitTs: 1
V1...
Uncommitted
Uncommitted
V3
V3
Text is not SVG - cannot display
Figure 5. Transaction abort.

Using PCC with multi-versioning eliminates the need for transaction private buffers for writes. Uncommitted values are able to be written in-place on the versioned list.

1

Arvola Chan, Stephen Fox, Wen-Te K. Lin, Anil Nori, and Daniel R. Ries. 1982. The implementation of an integrated concurrency control and recovery scheme. In Proceedings of the 1982 ACM SIGMOD international conference on Management of data (SIGMOD '82). Association for Computing Machinery, New York, NY, USA, 184–191. https://doi.org/10.1145/582353.582386

Strict Two-Phase Locking

To prevent dirty reads and cascading aborts with a simple implementation, strict two-phase locking can be used. Locks are acquired throughout the transaction, the growing phase. Locks are only released after the transaction is durable. The next section goes into the implications of releasing locks before the transaction is durable.

Early Lock Release

As one can surmise, it is beneficial to hold these locks for the shortest time period possible. The longer a lock is held, the longer conflicting transactions are blocked reducing concurrency. At the cost of more complexity, we can optimize how we hold locks after the growing phase to enable higher concurrency without dirty reads. There can be cascading aborts, albeit unlikely.

Because there is still a growing phase, the earliest we can release and still stay consistent with 2PL is after the commit request, no more locks will be acquired. This means locks can be released during the hardening stage. The hardening stage can be the dominant part of a transaction in OLTP workloads so enabling transaction progress during it can greatly increase performance during conflicts.

In figure 6 below, the blue line is the duration of a single node transaction. In strict 2PL the locks are held for the entire duration. With early lock release we only hold it for the green line as described above.

Figure 6. Single Node Lock Release. 5

Releasing locks early can result in cascading aborts. However, in single node transactions we are doing the early lock release during the hardening stage. Generally, the only way a transaction aborts in this stage is if the database crashes. Therefore, cascading aborts do not matter. Waiting is strictly worse than speculatively proceeding, because if the database crashes the waiters would also abort.

A traditional early lock release implementation releases locks after determining its commit LSN. Assuming LSNs are strictly increasing, this creates a barrier. All transactions, acquiring the lock after the release will have a later commit LSN. Thus it satisfies the following:

  1. B can only commit after A commits.

However, this does not hold for read-only read-write transactions that do not commit anything and thus do not acquire a commit LSN. Imagine Txn A releasing its locks, Txn B returns those rows, and then the database crashes before Txn A is durable. That is a dirty read. This is what MySQL InnoDB implemented for early lock release. They released locks before the fsync. Therefore they cautioned about dirty reads: B can return some of A's data, and then there is a crash before A commits.2 However, InnoDB reverted early lock release because even with that known limitation, it did not work with their crash recovery protocol.3

Therefore we need a new rule:

  1. B can operate on A's data inside the DBMS, but it must not return any of A's data to the client until A commits.

This is the rule that prevents dirty reads. This check can be implemented as a tag in the lock that stores the commit LSN of the latest release. In a read-write transaction that only performed reads, it will wait for the commit LSN to become durable.3

However, the high watermark is not efficient if the log supports out of order commits. Imagine the following scenario. Txn B violated one of Txn A's locks. Txn A is committed, but earlier in the log is a large Txn Z that is still committed. Txn B can't commit because it needs to wait for the high water mark to reach Txn A.

Txn Z (slow, committing) | Txn A (committed) | Txn B (waiting for HWM)

With a high watermark implementation, it may be faster for Txn A to not allow the lock violation so Txn B could immediately acquire the lock and commit. For correctness, this also assumes the out of order log will crash if one of the commits fails. Otherwise high watermark does not mean all transactions below are committed.

Additionally, this simple implementation does not support efficient two-phase commits. In two-phase commit there is the prepare phase and then the commit phase. Using high watermark only allows violations during the hardening of the commit phase. We are holding the much longer than necessary (the green line instead of red line).

Figure 7. 2PC Lock Release. 5

Therefore, we want a third rule to handle 2PC.

  1. B must abort if A aborts.

We are assuming here that 2PC aborts are rare because they can cause cascading aborts on failure. Unlike in the single node transaction, where waiters would also have been aborted (because of crash), waiters on the 2PC abort would still make progress. In the happy case though of 2PC succeeding, a 2PC transaction holds locks for the same amount of time as a 1PC transaction. There is still the overhead of a longer time to commit, but conflicting transactions are able to make progress in parallel.

If 2PC aborts have periods of high occurrence, the DBMS could dynamically switch from early lock release during the prepare phase to early lock release during the commit hardening phase, the green line. This would reduce concurrency but eliminate cascading aborts due to 2PC aborts.

4

Kimura, H., Graefe, G., & Kuno, H.A. (2012). Efficient Locking Techniques for Databases on Modern Hardware. ADMS@VLDB.

Controlled Lock Violation

Controlled lock violation is a slight reframing of early lock release. Instead of releasing the lock, the lock is still held and instead allows violations. One of the key benefits of controlled lock violation there is full visibility into the types of locks being violated. The paper focuses on the benefits this information has when using different lock types, hierarchical locking, and intention locks. There is a heavy focus on B-trees, and the paper's implementation is a boolean allow violation flag on a non-versioned B-tree.5

For our purposes, we are exploring how one can use the controlled lock violation ideas in the narrow context of multi-versioning and snapshot isolation. The benefits of having lock type information are unused because we only have a single lock type, exclusive locks.

An interesting observation is multi-versioned snapshot isolation as I talk about in this post does not really have the ability to do early lock release in the first place. The locks are embedded in the version list (through uncommitted values) and there is only a single type of lock, so it is impossible to "release" if a write is performed. The version list keeps the lock history.

5

Goetz Graefe, Mark Lillibridge, Harumi Kuno, Joseph Tucek, and Alistair Veitch. 2013. Controlled lock violation. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD '13). Association for Computing Machinery, New York, NY, USA, 85–96. https://doi.org/10.1145/2463676.2465325

Multi-Version Lock Violations

To support controlled lock violations, our version list needs to support the following:

  • Multiple uncommitted values that are dependent on each other.
  • Timestamps to determine uncommitted value visibility.

To do this, our versioned values include a new field, ViolationTs. This is the timestamp at which the locked uncommitted value allows violations. Violations are allowed after the execution phase when the commit request is sent, which keeps 2PL intact.

Read-only transactions are unchanged from the original design. They still determine visibility using CommitTs: the latest value with CommitTs <= StartTs. This is what ensures read-only transactions do not read uncommitted data and incur commit dependencies.

Read-write transactions use ViolationTs for visibility: the latest value with ViolationTs <= StartTs. Write-write conflicts are now defined using StartTs_A < ViolationTs_B < ViolationTs_A. This is the reduced time period of potential write-write conflicts that we want. It works because when ViolationTs_B <= StartTs_A, A has visibility of B. Using ViolationTs instead of CommitTs means B is possibly uncommitted, which is what the commit dependencies track. Locks guarantee there is only a single possible future that is being speculated and thus is visible.

Write Locking

Lets run through some scenarios to see how writes work in practice. The initial state, seen in figure 8, has two committed values, and an uncommitted chain with three values. As stated above, each versioned value has a ViolationTs and a CommitTs (invalid values if not committed/allowing violations yet). Txn A currently owns the exclusive lock on V2 (it was the transaction that inserted V3). Txn H StartTs < V3 ViolationTs and Txn J StartTs < V3 ViolationTs so they can only read V2. Therefore, they wait on V2 to be released instead. V5 which owns Txn C has not been released yet.

Waiters
Waiters
Txn H
StartTs: 3
Txn H...
Txn J
StartTs: 4
Txn J...
Versioned List
Versioned List
Committed
Committed
V2
CommitTs: 3
ViolationTs: 2
Txn A (StartTs:
3)
V2...
V1
CommitTs: 2
ViolationTs: 1
V1...
Uncommitted
Uncommitted
V4
ViolationTs: 6
Txn C, StartTs:
6)
V4...
V3
ViolationTs: 5
Txn B (StartTs:
5)
V3...
V5
V5
Waiters
Waiters
Txn K
StartTs: 6
Txn K...
Text is not SVG - cannot display
Figure 8. Initial state with controlled lock violation.

Figure 9 shows the outcome of Txn A committing. Similar to strict 2PL snapshot isolation, all the waiters abort with a write-write conflict. But, because we speculatively made progress on the future based on Txn A, we are able to continue with the uncommitted list. That is the key component of controlled lock violations. Additionally, V5 started allowing violations.

Write-Write Conflict
Write-Write Conflict
Txn J
StartTs: 2
Txn J...
Txn J
StartTs: 3
Txn J...
Versioned List
Versioned List
Committed
Committed
V2
CommitTs: 3
ViolationTs: 2
V2...
V1
CommitTs: 2
ViolationTs: 1
V1...
V3
CommitTs: 8
ViolationTs: 5
Txn B (StartTs:
5)
V3...
Uncommitted
Uncommitted
V4
ViolationTs: 6
Txn C (StartTs:
6)
V4...
V5
ViolationTs: 7
V5...
Waiters
Waiters
Txn K
StartTs: 6
Txn K...
Text is not SVG - cannot display
Figure 9. Transaction commit.

Next up GC was performed removing some of the lower committed versions to make the graphic easier for the reader. You can see that Txn B committed V4. Txn D (StartTs = 7) acquired the lock on V5 to insert V6. Notice that the read timestamp is less than V4 CommitTs=9. This is why using ViolationTs instead of CommitTs for visibility choices is important. The commit timestamp is greater than the timestamps in the uncommitted chain. But our read-write transactions need to have visibility into the uncommitted chain.

Versioned List
Versioned List
Committed
Committed
V3
CommitTs: 8
ViolationTs: 5
V3...
V4
CommitTs: 9
ViolationTs: 6
Txn C (StartTs:
6)
V4...
Uncommitted
Uncommitted
V5
ViolationTs: 7
Txn D (StartTs:
7)
V5...
V6
V6
Waiters
Waiters
Txn K
StartTs: 6
Txn K...
Text is not SVG - cannot display
Figure 10. Transaction commit.

Finally a demonstration of a cascading abort. Txn C aborts which also causes Txn D to abort. Because Txn K was waiting on V4 which Txn C originally owned, it was able to acquire the exclusive lock and insert its own V5.