Paper tinkering

This commit is contained in:
2024-02-19 13:29:16 -08:00
parent deb85f5645
commit 939b791e01
3 changed files with 22 additions and 19 deletions

View File

@@ -77,15 +77,6 @@ keywords = {data structures, searching, trees}
bibsource = {dblp computer science bibliography, https://dblp.org}
}
@book{10.5555/17299,
author = {Bernstein, Philip A and Hadzilacos, Vassos and Goodman, Nathan},
title = {Concurrency control and recovery in database systems},
year = {1986},
isbn = {0201107155},
publisher = {Addison-Wesley Longman Publishing Co., Inc.},
address = {USA}
}
@book{cormen2022introduction,
title={Introduction to algorithms},
author={Cormen, Thomas H and Leiserson, Charles E and Rivest, Ronald L and Stein, Clifford},
@@ -153,3 +144,14 @@ address = {USA}
pages={521--534},
year={2018}
}
@article{kung1981optimistic,
title={On optimistic methods for concurrency control},
author={Kung, Hsiang-Tsung and Robinson, John T},
journal={ACM Transactions on Database Systems (TODS)},
volume={6},
number={2},
pages={213--226},
year={1981},
publisher={ACM New York, NY, USA}
}

View File

@@ -16,8 +16,8 @@
\section{Abstract}
FoundationDB \cite{DBLP:conf/sigmod/ZhouXSNMTABSLRD21} provides serializability using a specialized data structure called \textit{lastCommit} \footnote{See Algorithm 1 referenced in \cite{DBLP:conf/sigmod/ZhouXSNMTABSLRD21}}.
This data structure maps key ranges (sets of bitwise-lexicographically-ordered keys denoted by either a singleton key or a half-open interval) to a version represented as a 64-bit integer.
FoundationDB \cite{DBLP:conf/sigmod/ZhouXSNMTABSLRD21} provides serializability using a specialized data structure called \textit{lastCommit} \footnote{See Algorithm 1 referenced in \cite{DBLP:conf/sigmod/ZhouXSNMTABSLRD21}} to implement optimistic concurrency control \cite{kung1981optimistic}.
This data structure encodes the write sets for recent transactions as a map from key ranges (represented as bitwise-lexicographically-ordered half-open intervals or singleton keys) to most recent write version, represented as a 64-bit integer.
FoundationDB implements \textit{lastCommit} as a version-augmented probabilistic skip list \cite{10.1145/78973.78977}.
In this paper, we propose an alternative implementation of \textit{lastCommit} as a version-augmented Adaptive Radix Tree (ART) \cite{DBLP:conf/icde/LeisK013}, and evaluate its performance.
@@ -29,23 +29,24 @@ For any ordered data structure we can implement \textit{lastCommit} using a repr
This is a standard technique used throughout FoundationDB.
The problem with applying this to an off-the-shelf ordered data structure is that checking a read range is linear in the number of intersecting physical keys.
Under a high-enough write load, there can be arbitrarily many point writes unexpired in the MVCC \cite{10.5555/17299} window.
Scanning through every point write intersecting a large range read would make conflict checking unacceptably slow.
In order to support higher concurrent write load, we must store write sets from more transactions.
Scanning through every recent point write intersecting a large range read would make conflict checking unacceptably slow for high-write-throughput workloads.
This suggests we consider augmenting \cite{cormen2022introduction} an ordered data structure to make checking the max version of a range sublinear.
Since finding the maximum of a set of elements is a decomposable search problem \cite{bentley1979decomposable}, we could apply the general technique using \texttt{std::max} as our binary operation, and \texttt{MIN\_INT} as our identity.
Algorithmically, this describes the implementation of FoundationDB's skip list.
Algorithmically, this describes FoundationDB's skip list.
We can also consider any other ordered data structure to augment, such as any variant of a balanced binary search tree \cite{adelson1962algorithm,guibas1978dichromatic,seidel1996randomized}, a b-tree \cite{comer1979ubiquitous}, or a radix tree \cite{DBLP:conf/icde/LeisK013,binna2018hot}.
Let's compare the relevant properties of our candidate data structures for insertion/update and read operations.
After insertion, the max version along the search path must reflect the update.
For comparison-based trees, updating max version along the search path cannot be done during top-down search, because \emph{insertion will change the search path}, and we do not know whether or not this is an insert or an update until we complete the top-down search.
We have no choice but to do a second, bottom-up pass to propagate max version changes.
Furthermore, the usual way of propagating the change will always propagate all the way to the root, since inserts always use the highest-yet version.
For a radix tree, max version can be updated on the top-down pass, and there's minimal overhead compared to the radix tree un-augmented.
Furthermore, the change will always propagate all the way to the root, since inserts always use the highest-yet version.
For a radix tree, insertion does not affect the search path, and so max version can be updated on the top-down pass.
There's minimal overhead compared to the radix tree un-augmented.
For ``last less than or equal to'' queries, skip lists have the convenient property that no backtracking is necessary, since the bottommost level is a sorted linked list.
Binary search trees and radix trees both require backtracking up the search path.
For ``last less than or equal to'' queries (which comprise the core of our read workload), skip lists have the convenient property that no backtracking is necessary, since the bottommost level is a sorted linked list.
Binary search trees and radix trees both require backtracking up the search path when an equal element is not found.
It's possible to trade off the backtracking for the increased overhead of maintaining the elements in an auxiliary sorted linked list during insertion.
Our options also have various tradeoffs inherited from their un-augmented versions such as different worst-case and expected bounds on the length of search paths and the number of rotations performed upon insert.