Draft Testing section

2024-04-19 14:26:47 -07:00
parent c96d682483
commit 37c75f747b
2 changed files with 113 additions and 3 deletions
--- a/paper/paper.tex
+++ b/paper/paper.tex
@@ -111,7 +111,6 @@ or equivalently
    [a_{0}\dots a_{k}, a_{0}\dots a_{k} 0)
 \]
 and continues with a sequence of prefix ranges ending in each digit up until $a_{k+1}$. 
-Recall that the range $[a_{0}\dots a_{k} 0, a_{0}\dots a_{k} 1)$ is equivalent to the set of keys starting with $a_{0}\dots a_{k} 0$.

 \begin{align*}
 \dots \quad \cup \quad & [a_{0}\dots a_{k} 0, a_{0}\dots a_{k} 1) \quad \cup \\
@@ -120,7 +119,9 @@ Recall that the range $[a_{0}\dots a_{k} 0, a_{0}\dots a_{k} 1)$ is equivalent t
 & [a_{0}\dots a_{k} (a_{k+1}-1), a_{0}\dots a_{k+1})
 \end{align*}

+Recall that the range $[a_{0}\dots a_{k} 0, a_{0}\dots a_{k} 1)$ is the set of keys starting with $a_{0}\dots a_{k} 0$.
 The remainder of the partition begins with the singleton set
+
 \[
    \dots \quad \cup \quad [a_{0}\dots a_{k + 1}, a_{0}\dots a_{k + 1} 0) \quad \cup\ \quad \dots
 \]
@@ -169,7 +170,7 @@ A few notes on implementation:
 \begin{itemize}
    \item{For clarity, the above algorithm decouples the logical partitioning from the physical structure of the tree. An optimized implementation would merge adjacent prefix ranges that don't correspond to nodes in the tree as it scans, so that it only calculates the version of such merged ranges once. Additionally, our implementation stores an index of which child pointers are valid as a bitset for Node48 and Node256 to speed up this scan using techniques inspired by \cite{Lemire_2018}.}
    \item{In order to avoid many costly pointer indirections, we can store the max version not in each node itself but next to each node's parent pointer. Without this, the range read performance is not competetive with the skip list.}
-    \item{An optimized implementation would construct the partition of $[a_{i}\dots a_{m}, a_{i} + 1)$ in reverse order, as it descends along the search path to $a_{i}\dots a_{m}$}
+    \item{An optimized implementation would visit the partition of $[a_{i}\dots a_{m}, a_{i} + 1)$ in reverse order, as it descends along the search path to $a_{i}\dots a_{m}$}
    \item{An optimized implementation would search for the common prefix first, and return early if any prefix of the common prefix has a $max \leq r$.}
 \end{itemize}

@@ -180,7 +181,7 @@ We track the rate of insertions of new nodes and make sure that our incremental

 \subsection{Adding point writes}

-A point write of $k$ at version $v$ simply sets $max \gets v$ \footnote{Recall that write versions are non-decreasing.} for every node along $k$'s search path, and sets $range$ for $k$'s node to the $range$ of the first node greater than $k$, or the \emph{oldest version} if none exists.
+A point write of $k$ at version $v$ simply sets $max \gets v$ \footnote{Recall that write versions are non-decreasing.} for every node along $k$'s search path, and sets $range$ for $k$'s node to the $range$ of the first node greater than $k$, or \emph{oldest version} if none exists.

 \subsection{Adding range writes}

@@ -191,6 +192,36 @@ Nodes along the search path to $e$ that are a strict prefix of $e$ get $max$ set

 \section{Testing}

+The correctness of \emph{lastCommit} is critically important, as a bug would likely result in data corruption, and so we use a variety of testing techniques.
+The main technique is to let libfuzzer \cite{libfuzzer} generate sequences of arbitrary operations, and apply each sequence to both the optimized radix tree and a naive implementation based on an unaugmented ordered map that serves as the specification of the intended behavior.
+After libfuzzer generates inputs with broad code coverage, we use libfuzzer's ``corpus minimization'' feature to pare down the test inputs without losing coverage (as measured by libfuzzer) into a fixed set of tests short enough that it's feasible to run interactively during development.
+In order to keep these test inputs short, we constrain the size of keys at the loss of some generality.
+We believe there isn't anything in the implementation particularly sensitive to the exact length of keys \footnote{\texttt{longestCommonPrefix} is a possible exception, but its length sensitivity is well encapsulated}.
+Libfuzzer's minimized corpus achieves 98\% line coverage on its own.
+We regenerate the corpus on an ad hoc basis by running libfuzzer for a few cpu-hours, during which it tests millions of unique inputs.
+
+In addition to asserting correct externally-visible behavior, in each of these tests we assert that internal invariants hold between operations.
+We also use address sanitizer \cite{10.5555/2342821.2342849} to detect memory errors, undefined behavior sanitizer \cite{ubsan} to detect invocations of undefined behavior, and thread sanitizer \cite{10.1145/1791194.1791203} (while exercising concurrent access as allowed by the documented contract) to detect data-race-related undefined behavior.
+
+Each of these sanitizers is implemented using compiler instrumentation, which means that they are not testing the final binary artifact that will be run in production.
+Therefore we also run the test inputs linking directly to the final release artifact, both standalone and under valgrind \cite{10.5555/1247360.1247362}.
+When testing the final artifacts, we do not assert internal invariants as we lack convenient access to the internals.
+As a defense against possible bugs in compilers' sanitizer and optimizer passes \cite{10.1145/3591257}, we also test with sanitizers enabled and optimizations disabled, and test with both clang and gcc.
+
+We audited the 2\% of lines that were not covered by libfuzzer \footnote{In order to see the uncovered lines for yourself, exclude all tests containing the word ``script'' with \texttt{ctest -E script}. Look in \texttt{Jenkinsfile} for an example of how to measure coverage.} and found the following:
+\begin{itemize}
+    \item Three occurrences which can be reached from an input that libfuzzer could theoretically generate. In each case the uncovered code is straightforward, and is exercised from an entry point by a manually written test.
+    \item One occurrence which requires a large number of operations, and cannot be reached from an input satisfying the size constraints we impose on libfuzzer. This code is also straightforward, and is exercised from an entry point by a manually written test. The purpose of this code is to keep memory usage in check, and so it's expected that it cannot be reached without a large number of operations.
+    \item One occurrence which is not reachable from any entry point, but is exercised when asserting internal invariants. This line is now suppressed with an explanatory comment.
+\end{itemize}
+
+We assert 100\% line coverage in continuous integration, which is achieved with a few caveats.
+2\% of the code is only covered by a few manually written tests.
+We suppress lines manually checked to be unreachable from an entry point.
+There is also a significant amount of test-only code which is suppressed from coverage measurements.
+There's a small difference in the behavior between debug and release builds: the code which scans for old entries gets run more frequently when assertions are enabled.
+This code is not straightforward, so exercising it from only a manually written test seems insufficient.
+
 \section{Conclusion}

 \printbibliography