7 Commits

Author SHA1 Message Date
b7e16b31ff Fill out empty subsections
All checks were successful
Tests / Clang total: 1096, passed: 1096
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / SIMD fallback total: 1096, passed: 1096
Tests / Release [gcc] total: 1096, passed: 1096
GNU C Compiler (gcc) |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Release [gcc,aarch64] total: 824, passed: 824
Tests / Coverage total: 823, passed: 823
weaselab/conflict-set/pipeline/head This commit looks good
2024-04-16 12:57:57 -07:00
a324d31518 Second pass at "Checking range reads" 2024-04-16 12:14:04 -07:00
fdb05e0e33 First draft of "Checking range reads" subsection
All checks were successful
Tests / Clang total: 1096, passed: 1096
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / SIMD fallback total: 1096, passed: 1096
Tests / Release [gcc] total: 1096, passed: 1096
GNU C Compiler (gcc) |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Release [gcc,aarch64] total: 824, passed: 824
Tests / Coverage total: 823, passed: 823
weaselab/conflict-set/pipeline/head This commit looks good
2024-04-15 17:32:33 -07:00
7c27d4a972 Always target macos 11.0 2024-04-09 14:08:58 -07:00
738de01cb4 Add getBytes to conflict_set.py
All checks were successful
Tests / Clang total: 1096, passed: 1096
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / SIMD fallback total: 1096, passed: 1096
Tests / Release [gcc] total: 1096, passed: 1096
GNU C Compiler (gcc) |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Release [gcc,aarch64] total: 824, passed: 824
Tests / Coverage total: 823, passed: 823
weaselab/conflict-set/pipeline/head This commit looks good
2024-04-08 17:33:51 -07:00
325cab6a95 Target earliest convenient macos version
All checks were successful
Tests / Clang total: 1096, passed: 1096
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / SIMD fallback total: 1096, passed: 1096
Tests / Release [gcc] total: 1096, passed: 1096
GNU C Compiler (gcc) |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Release [gcc,aarch64] total: 824, passed: 824
Tests / Coverage total: 823, passed: 823
weaselab/conflict-set/pipeline/head This commit looks good
2024-04-08 17:16:01 -07:00
0b2821941a Bump version
All checks were successful
Tests / Clang total: 1096, passed: 1096
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / SIMD fallback total: 1096, passed: 1096
Tests / Release [gcc] total: 1096, passed: 1096
GNU C Compiler (gcc) |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Release [gcc,aarch64] total: 824, passed: 824
Tests / Coverage total: 823, passed: 823
weaselab/conflict-set/pipeline/head This commit looks good
2024-04-08 15:52:06 -07:00
6 changed files with 126 additions and 10 deletions

View File

@@ -1,7 +1,7 @@
cmake_minimum_required(VERSION 3.18)
project(
conflict-set
VERSION 0.0.3
VERSION 0.0.4
DESCRIPTION
"A data structure for optimistic concurrency control on ranges of bitwise-lexicographically-ordered keys."
HOMEPAGE_URL "https://git.weaselab.dev/weaselab/conflict-set"
@@ -357,6 +357,7 @@ else()
endif()
# macos
set(CMAKE_OSX_DEPLOYMENT_TARGET 11.0)
if(APPLE)
find_program(PANDOC_EXE pandoc)
if(PANDOC_EXE)

View File

@@ -4,4 +4,5 @@ _bzero
_free
_malloc
_memcpy
_memmove
_memmove
dyld_stub_binder

View File

@@ -58,6 +58,9 @@ _lib.ConflictSet_setOldestVersion.argtypes = (ctypes.c_void_p, ctypes.c_int64)
_lib.ConflictSet_destroy.argtypes = (ctypes.c_void_p,)
_lib.ConflictSet_getBytes.argtypes = (ctypes.c_void_p,)
_lib.ConflictSet_getBytes.restype = ctypes.c_int64
class Result(enum.Enum):
COMMIT = 0
@@ -106,6 +109,9 @@ class ConflictSet:
def setOldestVersion(self, version: int) -> None:
_lib.ConflictSet_setOldestVersion(self.p, version)
def getBytes(self) -> int:
return _lib.ConflictSet_getBytes(self.p)
def __enter__(self):
return self

View File

@@ -155,3 +155,16 @@ keywords = {data structures, searching, trees}
year={1981},
publisher={ACM New York, NY, USA}
}
@article{Lemire_2018,
title={Roaring bitmaps: Implementation of an optimized software library},
volume={48},
ISSN={1097-024X},
url={http://dx.doi.org/10.1002/spe.2560},
DOI={10.1002/spe.2560},
number={4},
journal={Software: Practice and Experience},
publisher={Wiley},
author={Lemire, Daniel and Kaser, Owen and Kurz, Nathan and Deri, Luca and OHara, Chris and SaintJacques, François and SsiYanKai, Gregory},
year={2018},
month=jan, pages={867895} }

View File

@@ -6,6 +6,7 @@
\usepackage{tikz}
\usepackage{tikzscale}
\usepackage[edges]{forest}
\usepackage{amsmath}
\title{ARTful Conflict Checking for FoundationDB}
\author{Andrew Noyes \thanks{\href{mailto:andrew@weaselab.dev}{andrew@weaselab.dev}}}
@@ -20,7 +21,7 @@
\section*{Abstract}
FoundationDB \cite{DBLP:conf/sigmod/ZhouXSNMTABSLRD21} provides serializability using a specialized data structure called \textit{lastCommit} \footnote{See Algorithm 1 referenced in \cite{DBLP:conf/sigmod/ZhouXSNMTABSLRD21}} to implement optimistic concurrency control \cite{kung1981optimistic}.
FoundationDB \cite{DBLP:conf/sigmod/ZhouXSNMTABSLRD21} provides serializability using a specialized data structure called \textit{lastCommit} \footnote{See Algorithm 1 referenced in \cite{DBLP:conf/sigmod/ZhouXSNMTABSLRD21}.} to implement optimistic concurrency control \cite{kung1981optimistic}.
This data structure encodes the write sets for recent transactions as a map from key ranges (represented as bitwise-lexicographically-ordered half-open intervals) to most recent write versions.
FoundationDB implements \textit{lastCommit} as a version-augmented probabilistic skip list \cite{10.1145/78973.78977}.
In this paper, we propose an alternative implementation of \textit{lastCommit} as a version-augmented Adaptive Radix Tree (ART) \cite{DBLP:conf/icde/LeisK013}, and evaluate its performance.
@@ -68,9 +69,9 @@ See figure \ref{fig:tree} for an example tree after inserting
$\{ANY\} \rightarrow 2$,
$\{ARE\} \rightarrow 3$, and
$\{ART\} \rightarrow 4$.
Each node shows its partial prefix annotated with $(max,point,range)$.
Each node shows its partial prefix annotated with $max$ or $max,point,range$.
\subsection{Checking point reads}
\subsection{Checking point reads} \label{Checking point reads}
The algorithm for checking point reads follows directly from the definitions of the \emph{point} and \emph{range} fields.
Our input is a key $k$ and a read version $r$, and we must report whether or not the write version $v_{k}$ of $k$ is less than or equal to $r$.
@@ -84,10 +85,101 @@ As an optimization, during the search phase for a point read we can inspect the
If the max version among all keys starting with a prefix of $k$ is less than or equal to $r$, then $v_{k} \leq r$.
\subsection{Checking range reads}
\subsection{Adding point writes}
\subsection{Adding range writes}
Checking range reads is more involved. Logically the idea is to partition the range read so that each subrange in the partition is a single point or coincides with the set of keys beginning with a prefix (a \emph{prefix range}).
The max version of a single point is $v$ as described in \ref{Checking point reads}.
The max version of a prefix range is the $max$ of the node associated with the prefix if such a node exists, and $range$ of the next node with a $range$ field otherwise.
If there is no next node with a range field, then we ignore that subrange in our max version calculation.
The max version among all versions and max versions of subranges in this partition is the max version of the whole range, which we compare to $r$.
Let's start with partitioning the range in the case where the beginning of the range is a prefix of the end of the range.
We'll be able to use this as a subroutine in the general case.
Suppose our range is $[a_{0}\dots a_{k}, a_{0}\dots a_{n})$ where $k < n$, and $a_{i} \in [0, 256)$.
The partition starts with the singleton set containing the first key in the range.
\[
\{a_{0}\dots a_{k}\}
\]
or equivalently
\[
[a_{0}\dots a_{k}, a_{0}\dots a_{k} 0)
\]
and continues with a sequence of prefix ranges ending in each digit up until $a_{k+1}$.
Recall that the range $[a_{0}\dots a_{k} 0, a_{0}\dots a_{k} 1)$ is equivalent to the set of keys starting with $a_{0}\dots a_{k} 0$.
\begin{align*}
\dots \quad \cup \quad & [a_{0}\dots a_{k} 0, a_{0}\dots a_{k} 1) \quad \cup \\
& [a_{0}\dots a_{k} 1, a_{0}\dots a_{k} 2) \quad \cup \\
& \dots \\
& [a_{0}\dots a_{k} (a_{k+1}-1), a_{0}\dots a_{k+1})
\end{align*}
The remainder of the partition begins with the singleton set
\[
\dots \quad \cup \quad [a_{0}\dots a_{k + 1}, a_{0}\dots a_{k + 1} 0) \quad \cup\ \quad \dots
\]
and proceeds as above until a range ending at $a_{0}\dots a_{n}$.
Let's now consider a range where begin is not a prefix of end.
\[
[a_{0}\dots a_{m}, b_{0}\dots b_{n})
\]
Let $i$ be the lowest index such that $a_{i} \neq b_{i}$.
For brevity we will elide the common prefix up until $i$ in the following discussion.
We'll start with partitioning this range coarsely:
\begin{align*}
& [a_{i}\dots a_{m}, a_{i} + 1) \quad \cup \\
& [a_{i} + 1, a_{i} + 2) \quad \cup \\
& \dots \\
& [b_{i} - 1, b_{i}) \quad \cup \\
& [b_{i}, b_{i}\dots b_{n})
\end{align*}
The last range has a begin that's a prefix of end, and so we'll partition that as before.
The inner ranges are already prefix ranges.
This leaves only $[a_{i}\dots a_{m}, a_{i} + 1)$.
If $m = i$, then this range is adjacent to the first inner range above, and we're done.
Otherwise we'll partition this into
\begin{align*}
& [a_{i}\dots a_{m}, a_{i}\dots (a_{m} + 1)) \quad \cup \\
& [a_{i}\dots (a_{m} + 1), a_{i}\dots (a_{m} + 2)) \quad \cup \\
& \dots \\
& [a_{i}\dots 254, a_{i}\dots 255) \quad \cup \\
& [a_{i}\dots 255, a_{i}\dots (a_{m-1} + 1) )
\end{align*}
and repeat starting at \footnote{This doesn't explicitly describe how to handle the case where $a_{m-1} = 255$. In this case we would skip to the largest $j < m$ such that $a_{j} \neq 255$. We know $j \geq i$ since if $a_{i} = 255$ then the range is inverted.}
\[
\dots \quad \cup \quad [a_{i}\dots (a_{m-1} + 1), a_{i}\dots (a_{m-1} + 2))
\]
until we end at $a_{i} + 1$, adjacent to the first inner range.
A few notes on implementation:
\begin{itemize}
\item{For clarity, the above algorithm decouples the logical partitioning from the physical structure of the tree. An optimized implementation would merge adjacent prefix ranges that don't correspond to nodes in the tree as it scans, so that it only calculates the version of such merged ranges once. Additionally, our implementation stores an index of which child pointers are valid as a bitset for Node48 and Node256 to speed up this scan using techniques inspired by \cite{Lemire_2018}.}
\item{In order to avoid many costly pointer indirections, we can store the max version not in each node itself but next to each node's parent pointer. Without this, the range read performance is not competetive with the skip list.}
\item{An optimized implementation would construct the partition of $[a_{i}\dots a_{m}, a_{i} + 1)$ in reverse order, as it descends along the search path to $a_{i}\dots a_{m}$}
\item{An optimized implementation would search for the common prefix first, and return early if any prefix of the common prefix has a $max \leq r$.}
\end{itemize}
\subsection{Reclaiming old entries}
In order to bound memory usage, we track an \emph{oldest version}, reject transactions with read versions before \emph{oldest version}, and reclaim nodes made redundant by \emph{oldest version}.
We track the rate of insertions of new nodes and make sure that our incremental reclaiming of old nodes according to \emph{oldest version} outpaces inserts.
\subsection{Adding point writes}
A point write of $k$ at version $v$ simply sets $max \gets v$ \footnote{Recall that write versions are non-decreasing.} for every node along $k$'s search path, and sets $range$ for $k$'s node to the $range$ of the first node greater than $k$, or the \emph{oldest version} if none exists.
\subsection{Adding range writes}
A range write of $[b, e)$ at version $v$ performs a point write of $b$ at $v$, and then inserts a node at $e$ with $range$ set to $v$, and $point$ set such that the result of checking a read of $e$ is unaffected.
Nodes along the search path to $e$ that are a strict prefix of $e$ get $max$ set to $v$, and all nodes between $b$ and $e$ are removed.
\begin{figure}
\caption{}
\label{fig:tree}

View File

@@ -3,7 +3,10 @@ from conflict_set import *
def test_conflict_set():
with ConflictSet() as cs:
cs.addWrites(1, write(b""))
assert cs.check(read(0, b"")) == [Result.CONFLICT]
before = cs.getBytes()
key = b"a key"
cs.addWrites(1, write(key))
assert cs.getBytes() - before > 0
assert cs.check(read(0, key)) == [Result.CONFLICT]
cs.setOldestVersion(1)
assert cs.check(read(0, b"")) == [Result.TOO_OLD]
assert cs.check(read(0, key)) == [Result.TOO_OLD]