First draft of "Checking range reads" subsection

2024-04-15 17:32:33 -07:00
parent 7c27d4a972
commit fdb05e0e33
2 changed files with 86 additions and 0 deletions
--- a/paper/bibliography.bib
+++ b/paper/bibliography.bib
@@ -155,3 +155,16 @@ keywords = {data structures, searching, trees}
  year={1981},
  publisher={ACM New York, NY, USA}
 }
+
+@article{Lemire_2018,
+   title={Roaring bitmaps: Implementation of an optimized software library},
+   volume={48},
+   ISSN={1097-024X},
+   url={http://dx.doi.org/10.1002/spe.2560},
+   DOI={10.1002/spe.2560},
+   number={4},
+   journal={Software: Practice and Experience},
+   publisher={Wiley},
+   author={Lemire, Daniel and Kaser, Owen and Kurz, Nathan and Deri, Luca and O’Hara, Chris and Saint‐Jacques, François and Ssi‐Yan‐Kai, Gregory},
+   year={2018},
+   month=jan, pages={867–895} }
--- a/paper/paper.tex
+++ b/paper/paper.tex
@@ -6,6 +6,7 @@
 \usepackage{tikz}
 \usepackage{tikzscale}
 \usepackage[edges]{forest}
+\usepackage{amsmath}

 \title{ARTful Conflict Checking for FoundationDB}
 \author{Andrew Noyes \thanks{\href{mailto:andrew@weaselab.dev}{andrew@weaselab.dev}}}
@@ -84,6 +85,78 @@ As an optimization, during the search phase for a point read we can inspect the
 If the max version among all keys starting with a prefix of $k$ is less than or equal to $r$, then $v_{k} \leq r$.

 \subsection{Checking range reads}
+
+Checking range reads is more involved. Logically the idea is to partition the range read so that each partition is a single point or coincides with the set of keys beginning with a prefix.
+The max version of the set of keys starting with a prefix is then $max$ of the node associated with the prefix if such a node exists, and $range$ of the next node with a $range$ field otherwise.
+
+Let's start with partitioning the range in the case where the beginning of the range is a prefix of the end of the range.
+We'll be able to use this as a subroutine in the general case.
+Suppose our range is $[a_{0}\dots a_{k}, a_{0}\dots a_{n})$ where $k < n$.
+The partition starts with the singleton set containing the first key in the range.
+\[
+    \{a_{0}\dots a_{k}\}
+\]
+or equivalently
+\[
+    [a_{0}\dots a_{k}, a_{0}\dots a_{k} 0)
+\]
+and continues with a sequence of prefix ranges ending in each digit up until $a_{k+1}$. 
+Recall that the range $[a_{0}\dots a_{k} 0, a_{0}\dots a_{k} 1)$ is equivalent to the set of keys starting with $a_{0}\dots a_{k} 0$.
+
+\begin{align*}
+\dots \quad \cup \quad & [a_{0}\dots a_{k} 0, a_{0}\dots a_{k} 1) \quad \cup \\
+& [a_{0}\dots a_{k} 1, a_{0}\dots a_{k} 2) \quad \cup \\
+& \dots \\
+& [a_{0}\dots a_{k} (a_{k+1}-1), a_{0}\dots a_{k+1})
+\end{align*}
+
+The remainder of the partition begins with the singleton set
+\[
+    \dots \quad \cup \quad [a_{0}\dots a_{k + 1}, a_{0}\dots a_{k + 1} 0)
+\]
+and proceeds as above until a range ending at $a_{0}\dots a_{n}$.
+
+Let's now consider a range where begin is not a prefix of end.
+
+\[
+    [a_{0}\dots a_{m}, b_{0}\dots b_{n})
+\]
+
+Let $i$ be the lowest index such that $a_{i} \neq b_{i}$.
+For brevity we will elide the common prefix up until $i$ in the following discussion.
+We'll start with partitioning this range coarsely:
+
+\begin{align*}
+     & [a_{i}\dots a_{m}, a_{i} + 1) \quad \cup \\
+     & [a_{i} + 1, a_{i} + 2) \quad \cup \\
+     & \dots \\
+     & [b_{i} - 1, b_{i}) \quad \cup \\
+     & [b_{i}, b_{i}\dots b_{n})
+\end{align*}
+
+The last range has a begin that's a prefix of end, and so we'll partition that as before.
+The inner ranges are already prefix ranges.
+This leaves only $[a_{i}\dots a_{m}, a_{i} + 1)$.
+
+If $m = i$, then this range is adjacent to the first inner range above, and we're done.
+Otherwise we'll partition this into
+
+\begin{align*}
+     & [a_{i}\dots a_{m}, a_{i}\dots (a_{m} + 1)) \quad \cup \\
+     & \dots \\
+     & [a_{i}\dots 254, a_{i}\dots 255)
+\end{align*}
+
+and repeat with $m \gets m - 1$ until we are adjacent to the first inner range.
+
+A few notes on implementation:
+\begin{itemize}
+    \item{For clarity, the above algorithm decouples the logical partitioning from the physical structure of the tree. An optimized implementation would merge adjacent prefix ranges that don't correspond to nodes in the tree as it scans, so that it only calculates the version of merged ranges once. Additionally, our implementation stores an index of which child pointers are valid as a bitset for Node48 and Node256, using techniques inspired by \cite{Lemire_2018}.}
+    \item{In order to avoid many costly pointer indirections, we can store the max version not in each node itself but next to each node's parent pointer. Without this, the range read performance is not competetive with the skip list.}
+    \item{An optimized implementation would construct the partition of $[a_{i}\dots a_{m}, a_{i} + 1)$ in reverse order, as it descends along the search path to $[a_{i}\dots a_{m})$}
+\end{itemize}
+
+
 \subsection{Adding point writes}
 \subsection{Adding range writes}
 \subsection{Reclaiming old entries}