26 Commits

Author SHA1 Message Date
c11b4714b5 Check more preconditions in Debug mode
All checks were successful
Tests / Clang total: 2843, passed: 2843
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / 64 bit versions total: 2843, passed: 2843
Tests / Debug total: 2841, passed: 2841
Tests / SIMD fallback total: 2843, passed: 2843
Tests / Release [gcc] total: 2843, passed: 2843
GNU C Compiler (gcc) |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Release [gcc,aarch64] total: 2119, passed: 2119
Tests / Coverage total: 2136, passed: 2136
Code Coverage #### Project Overview No changes detected, that affect the code coverage. * Line Coverage: 98.92% (1834/1854) * Branch Coverage: 66.85% (1496/2238) * Complexity Density: 0.00 * Lines of Code: 1854 #### Quality Gates Summary Output truncated.
weaselab/conflict-set/pipeline/head This commit looks good
2024-08-26 12:43:07 -07:00
bc13094406 Don't violate precondition in test_conflict_set.py 2024-08-26 11:13:16 -07:00
c9d742b696 Make explicit the precondition that versions must be <= latest version
Some checks failed
Tests / Clang total: 2843, passed: 2843
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / 64 bit versions total: 2843, passed: 2843
Tests / Debug total: 2841, failed: 1, passed: 2840
Tests / SIMD fallback total: 2843, passed: 2843
Tests / Release [gcc] total: 2843, passed: 2843
Tests / Release [gcc,aarch64] total: 2119, passed: 2119
Tests / Coverage total: 2136, failed: 1, passed: 2135
weaselab/conflict-set/pipeline/head There was a failure building this commit
2024-08-24 13:43:39 -07:00
795ae7cb01 Internal.h doesn't need to include <vector> 2024-08-24 10:38:07 -07:00
849e2d3e5c Remove unused code 2024-08-22 22:31:36 -07:00
1560037680 Remove unused struct 2024-08-22 17:55:54 -07:00
764c31bbc8 Run skip list gc at 200% entry insertion rate 2024-08-22 17:26:08 -07:00
ee3361952a Use better rng in skip list
low bits of lcg have low periods
2024-08-22 17:00:15 -07:00
8a04e57353 Add to corpus
All checks were successful
Tests / Clang total: 2843, passed: 2843
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / 64 bit versions total: 2843, passed: 2843
Tests / Debug total: 2841, passed: 2841
Tests / SIMD fallback total: 2843, passed: 2843
Tests / Release [gcc] total: 2843, passed: 2843
GNU C Compiler (gcc) |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Release [gcc,aarch64] total: 2119, passed: 2119
Tests / Coverage total: 2136, passed: 2136
Code Coverage #### Project Overview No changes detected, that affect the code coverage. * Line Coverage: 98.92% (1825/1845) * Branch Coverage: 66.85% (1480/2214) * Complexity Density: 0.00 * Lines of Code: 1845 #### Quality Gates Summary Output truncated.
weaselab/conflict-set/pipeline/head This commit looks good
2024-08-22 16:52:33 -07:00
7f86fdee66 Test 64 bit versions
All checks were successful
Tests / Clang total: 2710, passed: 2710
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / 64 bit versions total: 2710, passed: 2710
Tests / Debug total: 2708, passed: 2708
Tests / SIMD fallback total: 2710, passed: 2710
Tests / Release [gcc] total: 2710, passed: 2710
GNU C Compiler (gcc) |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Release [gcc,aarch64] total: 2020, passed: 2020
Tests / Coverage total: 2036, passed: 2036
Code Coverage #### Project Overview No changes detected, that affect the code coverage. * Line Coverage: 98.92% (1825/1845) * Branch Coverage: 66.85% (1480/2214) * Complexity Density: 0.00 * Lines of Code: 1845 #### Quality Gates Summary Output truncated.
weaselab/conflict-set/pipeline/head This commit looks good
Keep 32 bit versions the default though
2024-08-21 14:00:00 -07:00
442755d0a6 Update implementation notes
This doesn't really capture the complexity, but at least it's more
accurate
2024-08-21 14:00:00 -07:00
e15b3bb137 Bump version
All checks were successful
Tests / Clang total: 2710, passed: 2710
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Debug total: 2708, passed: 2708
Tests / SIMD fallback total: 2710, passed: 2710
Tests / Release [gcc] total: 2710, passed: 2710
GNU C Compiler (gcc) |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Release [gcc,aarch64] total: 2020, passed: 2020
Tests / Coverage total: 2036, passed: 2036
Code Coverage #### Project Overview No changes detected, that affect the code coverage. * Line Coverage: 98.97% (1821/1840) * Branch Coverage: 67.00% (1478/2206) * Complexity Density: 0.00 * Lines of Code: 1840 #### Quality Gates Summary Output truncated.
weaselab/conflict-set/pipeline/head This commit looks good
2024-08-20 16:58:39 -07:00
311794c37e Update GCOVR annotations now that jenkins agent has avx512
All checks were successful
Tests / Clang total: 2710, passed: 2710
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Debug total: 2708, passed: 2708
Tests / SIMD fallback total: 2710, passed: 2710
Tests / Release [gcc] total: 2710, passed: 2710
GNU C Compiler (gcc) |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Release [gcc,aarch64] total: 2020, passed: 2020
Tests / Coverage total: 2036, passed: 2036
Code Coverage #### Project Overview No changes detected, that affect the code coverage. * Line Coverage: 98.97% (1821/1840) * Branch Coverage: 67.00% (1478/2206) * Complexity Density: 0.00 * Lines of Code: 1840 #### Quality Gates Summary Output truncated.
weaselab/conflict-set/pipeline/head This commit looks good
2024-08-20 16:34:27 -07:00
dfa178ba19 Add to corpus (from fuzzing on macos)
Some checks failed
Tests / Clang total: 2710, passed: 2710
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Debug total: 2708, passed: 2708
Tests / SIMD fallback total: 2710, passed: 2710
Tests / Release [gcc] total: 2710, passed: 2710
GNU C Compiler (gcc) |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Release [gcc,aarch64] total: 2020, passed: 2020
Tests / Coverage total: 2036, passed: 2036
Code Coverage #### Project Overview No changes detected, that affect the code coverage. * Line Coverage: 98.85% (1807/1828) * Branch Coverage: 67.11% (1475/2198) * Complexity Density: 0.00 * Lines of Code: 1828 #### Quality Gates Summary Output truncated.
weaselab/conflict-set/pipeline/head There was a failure building this commit
2024-08-20 13:32:04 -07:00
a16d18edfe Update README.md
All checks were successful
Tests / Clang total: 2626, passed: 2626
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Debug total: 2624, passed: 2624
Tests / SIMD fallback total: 2626, passed: 2626
Tests / Release [gcc] total: 2626, passed: 2626
GNU C Compiler (gcc) |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Release [gcc,aarch64] total: 1957, passed: 1957
Tests / Coverage total: 1973, passed: 1973
Code Coverage #### Project Overview No changes detected, that affect the code coverage. * Line Coverage: 99.29% (1815/1828) * Branch Coverage: 67.47% (1483/2198) * Complexity Density: 0.00 * Lines of Code: 1828 #### Quality Gates Summary Output truncated.
weaselab/conflict-set/pipeline/head This commit looks good
2024-08-20 12:03:27 -07:00
2b60287448 Partition valgrind into tests of size at most 100
All checks were successful
Tests / Clang total: 2626, passed: 2626
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Debug total: 2624, passed: 2624
Tests / SIMD fallback total: 2626, passed: 2626
Tests / Release [gcc] total: 2626, passed: 2626
GNU C Compiler (gcc) |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Release [gcc,aarch64] total: 1957, passed: 1957
Tests / Coverage total: 1973, passed: 1973
Code Coverage #### Project Overview No changes detected, that affect the code coverage. * Line Coverage: 99.29% (1815/1828) * Branch Coverage: 67.47% (1483/2198) * Complexity Density: 0.00 * Lines of Code: 1828 #### Quality Gates Summary Output truncated.
weaselab/conflict-set/pipeline/head This commit looks good
2024-08-20 11:35:20 -07:00
0a9ac59676 Commit to non-simd Node3 implementations
Some checks failed
Tests / Clang total: 2620, passed: 2620
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Debug total: 2618, passed: 2618
Tests / SIMD fallback total: 2620, passed: 2620
Tests / Release [gcc] total: 2620, passed: 2620
GNU C Compiler (gcc) |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Release [gcc,aarch64] total: 1957, passed: 1957
Tests / Coverage total: 1967, failed: 1, passed: 1966
weaselab/conflict-set/pipeline/head There was a failure building this commit
2024-08-20 10:36:04 -07:00
e3a77ed773 Remove unnecessary casts 2024-08-20 10:30:27 -07:00
cdf9a8a7b0 Save 8 bytes in Node3 2024-08-20 10:30:07 -07:00
305dfdd52f Change whitespace in node structs for consistency 2024-08-20 09:57:44 -07:00
7261c91492 Remove Node48::nextFree, and improve padding to save 8 bytes 2024-08-20 09:51:29 -07:00
f11720f5ae Add to corpus
Some checks reported errors
Tests / Clang total: 2620, passed: 2620
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Debug total: 2618, passed: 2618
Tests / SIMD fallback total: 2620, passed: 2620
Tests / Release [gcc] total: 2620, passed: 2620
GNU C Compiler (gcc) |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Release [gcc,aarch64] total: 1957, passed: 1957
Tests / Coverage total: 1967, failed: 1, passed: 1966
weaselab/conflict-set/pipeline/head Something is wrong with the build of this commit
2024-08-19 21:52:47 -07:00
e2b7298af5 Instrument setOldestVersion for callgrind too 2024-08-19 21:32:04 -07:00
8e1e344f4b Fix clang-18 warning about std::basic_string<uint8_t>
All checks were successful
Tests / Clang total: 2500, passed: 2500
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Debug total: 2498, passed: 2498
Tests / SIMD fallback total: 2500, passed: 2500
Tests / Release [gcc] total: 2500, passed: 2500
GNU C Compiler (gcc) |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Release [gcc,aarch64] total: 1867, passed: 1867
Tests / Coverage total: 1877, passed: 1877
Code Coverage #### Project Overview No changes detected, that affect the code coverage. * Line Coverage: 99.29% (1824/1837) * Branch Coverage: 67.44% (1485/2202) * Complexity Density: 0.00 * Lines of Code: 1837 #### Quality Gates Summary Output truncated.
weaselab/conflict-set/pipeline/head This commit looks good
2024-08-17 16:40:32 -07:00
3634b6a59b Simplify slightly in checkMaxBetweenExclusive
Some checks failed
Tests / Clang total: 2500, passed: 2500
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
weaselab/conflict-set/pipeline/head There was a failure building this commit
2024-08-17 14:34:13 -07:00
a3cc14c807 Fix double counting of remove node in showMemory mode 2024-08-17 13:43:47 -07:00
94 changed files with 257 additions and 200 deletions

View File

@@ -1,7 +1,7 @@
cmake_minimum_required(VERSION 3.18)
project(
conflict-set
VERSION 0.0.12
VERSION 0.0.13
DESCRIPTION
"A data structure for optimistic concurrency control on ranges of bitwise-lexicographically-ordered keys."
HOMEPAGE_URL "https://git.weaselab.dev/weaselab/conflict-set"
@@ -276,9 +276,15 @@ if(CMAKE_SOURCE_DIR STREQUAL CMAKE_CURRENT_SOURCE_DIR AND BUILD_TESTING)
find_program(VALGRIND_EXE valgrind)
if(VALGRIND_EXE AND NOT CMAKE_CROSSCOMPILING)
add_test(NAME conflict_set_blackbox_valgrind
COMMAND ${VALGRIND_EXE} --error-exitcode=99 --
$<TARGET_FILE:driver> ${CORPUS_TESTS})
list(LENGTH CORPUS_TESTS len)
math(EXPR last "${len} - 1")
set(partition_size 100)
foreach(i RANGE 0 ${last} ${partition_size})
list(SUBLIST CORPUS_TESTS ${i} ${partition_size} partition)
add_test(NAME conflict_set_blackbox_valgrind_${i}
COMMAND ${VALGRIND_EXE} --error-exitcode=99 --
$<TARGET_FILE:driver> ${partition})
endforeach()
endif()
# api smoke tests

View File

@@ -87,22 +87,42 @@ constexpr int64_t kMaxCorrectVersionWindow =
std::numeric_limits<int32_t>::max();
static_assert(kNominalVersionWindow <= kMaxCorrectVersionWindow);
#ifndef USE_64_BIT
#define USE_64_BIT 0
#endif
struct InternalVersionT {
constexpr InternalVersionT() = default;
constexpr explicit InternalVersionT(int64_t value) : value(value) {}
constexpr int64_t toInt64() const { return value; } // GCOVR_EXCL_LINE
constexpr auto operator<=>(const InternalVersionT &rhs) const {
#if USE_64_BIT
return value <=> rhs.value;
#else
// Maintains ordering after overflow, as long as the full-precision versions
// are within `kMaxCorrectVersionWindow` of eachother.
return int32_t(value - rhs.value) <=> 0;
#endif
}
constexpr bool operator==(const InternalVersionT &) const = default;
#if USE_64_BIT
static const InternalVersionT zero;
#else
static thread_local InternalVersionT zero;
#endif
private:
#if USE_64_BIT
int64_t value;
#else
uint32_t value;
#endif
};
#if USE_64_BIT
const InternalVersionT InternalVersionT::zero{0};
#else
thread_local InternalVersionT InternalVersionT::zero;
#endif
struct Entry {
InternalVersionT pointVersion;
@@ -197,7 +217,6 @@ struct Node {
/* end section that's copied to the next node */
uint8_t *partialKey();
Type getType() const { return type; }
int32_t getCapacity() const { return partialKeyCapacity; }
@@ -221,84 +240,83 @@ constexpr int kNodeCopySize =
struct Node0 : Node {
constexpr static auto kType = Type_Node0;
uint8_t *partialKey() { return (uint8_t *)(this + 1); }
uint8_t *partialKey() { return (uint8_t *)(this + 1); }
void copyChildrenAndKeyFrom(const Node0 &other);
void copyChildrenAndKeyFrom(const struct Node3 &other);
size_t size() const { return sizeof(Node0) + getCapacity(); }
};
struct Node3 : Node {
constexpr static auto kMaxNodes = 3;
constexpr static auto kType = Type_Node3;
// Sorted
uint8_t index[kMaxNodes];
Node *children[kMaxNodes];
InternalVersionT childMaxVersion[kMaxNodes];
uint8_t *partialKey() { return (uint8_t *)(this + 1); }
// Sorted
uint8_t index[kMaxNodes];
uint8_t *partialKey() { return (uint8_t *)(this + 1); }
void copyChildrenAndKeyFrom(const Node0 &other);
void copyChildrenAndKeyFrom(const Node3 &other);
void copyChildrenAndKeyFrom(const struct Node16 &other);
size_t size() const { return sizeof(Node3) + getCapacity(); }
};
struct Node16 : Node {
constexpr static auto kType = Type_Node16;
constexpr static auto kMaxNodes = 16;
// Sorted
uint8_t index[kMaxNodes];
Node *children[kMaxNodes];
InternalVersionT childMaxVersion[kMaxNodes];
uint8_t *partialKey() { return (uint8_t *)(this + 1); }
// Sorted
uint8_t index[kMaxNodes];
uint8_t *partialKey() { return (uint8_t *)(this + 1); }
void copyChildrenAndKeyFrom(const Node3 &other);
void copyChildrenAndKeyFrom(const Node16 &other);
void copyChildrenAndKeyFrom(const struct Node48 &other);
size_t size() const { return sizeof(Node16) + getCapacity(); }
};
struct Node48 : Node {
constexpr static auto kType = Type_Node48;
constexpr static auto kMaxNodes = 48;
BitSet bitSet;
int8_t nextFree;
int8_t index[256];
Node *children[kMaxNodes];
InternalVersionT childMaxVersion[kMaxNodes];
uint8_t reverseIndex[kMaxNodes];
constexpr static int kMaxOfMaxPageSize = 16;
constexpr static int kMaxOfMaxShift =
std::countr_zero(uint32_t(kMaxOfMaxPageSize));
constexpr static int kMaxOfMaxTotalPages = kMaxNodes / kMaxOfMaxPageSize;
BitSet bitSet;
Node *children[kMaxNodes];
InternalVersionT childMaxVersion[kMaxNodes];
InternalVersionT maxOfMax[kMaxOfMaxTotalPages];
uint8_t reverseIndex[kMaxNodes];
int8_t index[256];
uint8_t *partialKey() { return (uint8_t *)(this + 1); }
void copyChildrenAndKeyFrom(const Node16 &other);
void copyChildrenAndKeyFrom(const Node48 &other);
void copyChildrenAndKeyFrom(const struct Node256 &other);
size_t size() const { return sizeof(Node48) + getCapacity(); }
};
struct Node256 : Node {
constexpr static auto kType = Type_Node256;
BitSet bitSet;
Node *children[256];
InternalVersionT childMaxVersion[256];
constexpr static auto kMaxNodes = 256;
constexpr static int kMaxOfMaxPageSize = 16;
constexpr static int kMaxOfMaxShift =
std::countr_zero(uint32_t(kMaxOfMaxPageSize));
constexpr static int kMaxOfMaxTotalPages = 256 / kMaxOfMaxPageSize;
constexpr static int kMaxOfMaxTotalPages = kMaxNodes / kMaxOfMaxPageSize;
BitSet bitSet;
Node *children[kMaxNodes];
InternalVersionT childMaxVersion[kMaxNodes];
InternalVersionT maxOfMax[kMaxOfMaxTotalPages];
uint8_t *partialKey() { return (uint8_t *)(this + 1); }
void copyChildrenAndKeyFrom(const Node48 &other);
void copyChildrenAndKeyFrom(const Node256 &other);
size_t size() const { return sizeof(Node256) + getCapacity(); }
};
@@ -323,7 +341,7 @@ inline void Node3::copyChildrenAndKeyFrom(const Node0 &other) {
inline void Node3::copyChildrenAndKeyFrom(const Node3 &other) {
memcpy((char *)this + kNodeCopyBegin, (char *)&other + kNodeCopyBegin,
kNodeCopySize);
memcpy(index, other.index, sizeof(*this) - sizeof(Node));
memcpy(children, other.children, sizeof(*this) - sizeof(Node));
memcpy(partialKey(), &other + 1, partialKeyLen);
for (int i = 0; i < numChildren; ++i) {
assert(children[i]->parent == &other);
@@ -404,7 +422,6 @@ inline void Node48::copyChildrenAndKeyFrom(const Node16 &other) {
}
memcpy(partialKey(), &other + 1, partialKeyLen);
bitSet.init();
nextFree = Node16::kMaxNodes;
int i = 0;
for (auto x : other.index) {
bitSet.set(x);
@@ -424,7 +441,6 @@ inline void Node48::copyChildrenAndKeyFrom(const Node48 &other) {
memcpy((char *)this + kNodeCopyBegin, (char *)&other + kNodeCopyBegin,
kNodeCopySize);
bitSet = other.bitSet;
nextFree = other.nextFree;
memcpy(index, other.index, sizeof(index));
memset(children, 0, sizeof(children));
const auto z = InternalVersionT::zero;
@@ -451,7 +467,6 @@ inline void Node48::copyChildrenAndKeyFrom(const Node256 &other) {
for (auto &v : childMaxVersion) {
v = z;
}
nextFree = other.numChildren;
bitSet = other.bitSet;
int i = 0;
bitSet.forEachSet([&](int c) {
@@ -523,8 +538,13 @@ std::string getSearchPath(Node *n);
// Each node with an entry present gets a budget of kBytesPerKey. Node0 always
// has an entry present.
// Induction hypothesis is that each node's surplus is >= kMinNodeSurplus
#if USE_64_BIT
constexpr int kBytesPerKey = 144;
constexpr int kMinNodeSurplus = 104;
#else
constexpr int kBytesPerKey = 112;
constexpr int kMinNodeSurplus = 80;
#endif
// Cound the entry itself as a child
constexpr int kMinChildrenNode0 = 1;
constexpr int kMinChildrenNode3 = 2;
@@ -729,9 +749,13 @@ struct WriteContext {
int64_t write_bytes;
} accum;
#if USE_64_BIT
static constexpr InternalVersionT zero{0};
#else
// Cache a copy of InternalVersionT::zero, so we don't need to do the TLS
// lookup as often.
InternalVersionT zero;
#endif
WriteContext() { memset(&accum, 0, sizeof(accum)); }
@@ -773,22 +797,17 @@ private:
BoundedFreeListAllocator<Node256> node256;
};
template <class NodeT> int getNodeIndex(NodeT *self, uint8_t index) {
static_assert(std::is_same_v<NodeT, Node3> || std::is_same_v<NodeT, Node16>);
// cachegrind says the plain loop is fewer instructions and more mis-predicted
// branches. Microbenchmark says plain loop is faster. It's written in this
// weird "generic" way though in case someday we can use the simd
// implementation easily if we want.
if constexpr (std::is_same_v<NodeT, Node3>) {
Node3 *n = (Node3 *)self;
for (int i = 0; i < n->numChildren; ++i) {
if (n->index[i] == index) {
return i;
}
int getNodeIndex(Node3 *self, uint8_t index) {
Node3 *n = (Node3 *)self;
for (int i = 0; i < n->numChildren; ++i) {
if (n->index[i] == index) {
return i;
}
return -1;
}
return -1;
}
int getNodeIndex(Node16 *self, uint8_t index) {
#ifdef HAS_AVX
// Based on https://www.the-paper-trail.org/post/art-paper-notes/
@@ -801,7 +820,7 @@ template <class NodeT> int getNodeIndex(NodeT *self, uint8_t index) {
// keys aren't valid, we'll mask the results to only consider the valid ones
// below.
__m128i indices;
memcpy(&indices, self->index, NodeT::kMaxNodes);
memcpy(&indices, self->index, Node16::kMaxNodes);
__m128i results = _mm_cmpeq_epi8(key_vec, indices);
// Build a mask to select only the first node->num_children values from the
@@ -824,12 +843,11 @@ template <class NodeT> int getNodeIndex(NodeT *self, uint8_t index) {
// https://community.arm.com/arm-community-blogs/b/infrastructure-solutions-blog/posts/porting-x86-vector-bitmask-optimizations-to-arm-neon
uint8x16_t indices;
memcpy(&indices, self->index, NodeT::kMaxNodes);
memcpy(&indices, self->index, Node16::kMaxNodes);
// 0xff for each match
uint16x8_t results =
vreinterpretq_u16_u8(vceqq_u8(vdupq_n_u8(index), indices));
static_assert(NodeT::kMaxNodes <= 16);
assume(self->numChildren <= NodeT::kMaxNodes);
assume(self->numChildren <= Node16::kMaxNodes);
uint64_t mask = self->numChildren == 16
? uint64_t(-1)
: (uint64_t(1) << (self->numChildren * 4)) - 1;
@@ -1082,22 +1100,18 @@ ChildAndMaxVersion getChildAndMaxVersion(Node *self, uint8_t index) {
}
}
template <class NodeT> Node *getChildGeqSimd(NodeT *self, int child) {
static_assert(std::is_same_v<NodeT, Node3> || std::is_same_v<NodeT, Node16>);
Node *getChildGeq(Node0 *, int) { return nullptr; }
// cachegrind says the plain loop is fewer instructions and more mis-predicted
// branches. Microbenchmark says plain loop is faster. It's written in this
// weird "generic" way though so that someday we can use the simd
// implementation easily if we want.
if constexpr (std::is_same_v<NodeT, Node3>) {
Node3 *n = (Node3 *)self;
for (int i = 0; i < n->numChildren; ++i) {
if (n->index[i] >= child) {
return n->children[i];
}
Node *getChildGeq(Node3 *n, int child) {
for (int i = 0; i < n->numChildren; ++i) {
if (n->index[i] >= child) {
return n->children[i];
}
return nullptr;
}
return nullptr;
}
Node *getChildGeq(Node16 *self, int child) {
if (child > 255) {
return nullptr;
}
@@ -1105,7 +1119,7 @@ template <class NodeT> Node *getChildGeqSimd(NodeT *self, int child) {
#ifdef HAS_AVX
__m128i key_vec = _mm_set1_epi8(child);
__m128i indices;
memcpy(&indices, self->index, NodeT::kMaxNodes);
memcpy(&indices, self->index, Node16::kMaxNodes);
__m128i results = _mm_cmpeq_epi8(key_vec, _mm_min_epu8(key_vec, indices));
int mask = (1 << self->numChildren) - 1;
uint32_t bitfield = _mm_movemask_epi8(results) & mask;
@@ -1115,8 +1129,7 @@ template <class NodeT> Node *getChildGeqSimd(NodeT *self, int child) {
memcpy(&indices, self->index, sizeof(self->index));
// 0xff for each leq
auto results = vcleq_u8(vdupq_n_u8(child), indices);
static_assert(NodeT::kMaxNodes <= 16);
assume(self->numChildren <= NodeT::kMaxNodes);
assume(self->numChildren <= Node16::kMaxNodes);
uint64_t mask = self->numChildren == 16
? uint64_t(-1)
: (uint64_t(1) << (self->numChildren * 4)) - 1;
@@ -1141,13 +1154,6 @@ template <class NodeT> Node *getChildGeqSimd(NodeT *self, int child) {
#endif
}
Node *getChildGeq(Node0 *, int) { return nullptr; }
Node *getChildGeq(Node3 *self, int child) {
return getChildGeqSimd(self, child);
}
Node *getChildGeq(Node16 *self, int child) {
return getChildGeqSimd(self, child);
}
Node *getChildGeq(Node48 *self, int child) {
int c = self->bitSet.firstSetGeq(child);
if (c < 0) {
@@ -1360,7 +1366,7 @@ Node *&getOrCreateChild(Node *&self, std::span<const uint8_t> &key,
auto *self3 = static_cast<Node3 *>(self);
int i = self->numChildren - 1;
for (; i >= 0; --i) {
if (int(self3->index[i]) < int(index)) {
if (self3->index[i] < index) {
break;
}
self3->index[i + 1] = self3->index[i];
@@ -1390,7 +1396,7 @@ Node *&getOrCreateChild(Node *&self, std::span<const uint8_t> &key,
auto *self16 = static_cast<Node16 *>(self);
int i = self->numChildren - 1;
for (; i >= 0; --i) {
if (int(self16->index[i]) < int(index)) {
if (self16->index[i] < index) {
break;
}
self16->index[i + 1] = self16->index[i];
@@ -1419,9 +1425,7 @@ Node *&getOrCreateChild(Node *&self, std::span<const uint8_t> &key,
insert48:
auto *self48 = static_cast<Node48 *>(self);
self48->bitSet.set(index);
++self->numChildren;
assert(self48->nextFree < 48);
int nextFree = self48->nextFree++;
auto nextFree = self48->numChildren++;
self48->index[index] = nextFree;
self48->reverseIndex[nextFree] = index;
auto &result = self48->children[nextFree];
@@ -1569,8 +1573,6 @@ void maybeDecreaseCapacity(Node *&self, WriteContext *tls,
}
#if defined(HAS_AVX) && !defined(__SANITIZE_THREAD__)
// This gets covered in local development
// GCOVR_EXCL_START
__attribute__((target("avx512f"))) void rezero16(InternalVersionT *vs,
InternalVersionT zero) {
uint32_t z;
@@ -1580,7 +1582,6 @@ __attribute__((target("avx512f"))) void rezero16(InternalVersionT *vs,
_mm512_sub_epi32(_mm512_loadu_epi32(vs), zvec), _mm512_setzero_epi32());
_mm512_mask_storeu_epi32(vs, m, zvec);
}
// GCOVR_EXCL_STOP
__attribute__((target("default")))
#endif
@@ -1591,6 +1592,9 @@ void rezero16(InternalVersionT *vs, InternalVersionT zero) {
}
}
#if USE_64_BIT
void rezero(Node *, InternalVersionT) {}
#else
void rezero(Node *n, InternalVersionT z) {
#if DEBUG_VERBOSE && !defined(NDEBUG)
fprintf(stderr, "rezero to %" PRId64 ": %s\n", z.toInt64(),
@@ -1633,6 +1637,7 @@ void rezero(Node *n, InternalVersionT z) {
__builtin_unreachable(); // GCOVR_EXCL_LINE
}
}
#endif
void mergeWithChild(Node *&self, WriteContext *tls, ConflictSet::Impl *impl,
Node *&dontInvalidate, Node3 *self3) {
@@ -1812,7 +1817,7 @@ Node *erase(Node *self, WriteContext *tls, ConflictSet::Impl *impl,
parent48->bitSet.reset(parentsIndex);
int8_t toRemoveChildrenIndex =
std::exchange(parent48->index[parentsIndex], -1);
int8_t lastChildrenIndex = --parent48->nextFree;
auto lastChildrenIndex = --parent48->numChildren;
assert(toRemoveChildrenIndex >= 0);
assert(lastChildrenIndex >= 0);
if (toRemoveChildrenIndex != lastChildrenIndex) {
@@ -1831,8 +1836,6 @@ Node *erase(Node *self, WriteContext *tls, ConflictSet::Impl *impl,
}
parent48->childMaxVersion[lastChildrenIndex] = tls->zero;
--parent->numChildren;
if (needsDownsize(parent48)) {
downsize(parent48, tls, impl, result);
}
@@ -2017,11 +2020,18 @@ downLeftSpine:
}
#ifdef HAS_AVX
uint32_t compare16_32bit(const InternalVersionT *vs, InternalVersionT rv) {
uint32_t compare16(const InternalVersionT *vs, InternalVersionT rv) {
#if USE_64_BIT
uint32_t compared = 0;
__m128i w[4];
for (int i = 0; i < 16; ++i) {
compared |= (vs[i] > rv) << i;
}
return compared;
#else
uint32_t compared = 0;
__m128i w[4]; // GCOVR_EXCL_LINE
memcpy(w, vs, sizeof(w));
uint32_t r;
uint32_t r; // GCOVR_EXCL_LINE
memcpy(&r, &rv, sizeof(r));
const auto rvVec = _mm_set1_epi32(r);
const auto zero = _mm_setzero_si128();
@@ -2031,20 +2041,28 @@ uint32_t compare16_32bit(const InternalVersionT *vs, InternalVersionT rv) {
<< (i * 4);
}
return compared;
#endif
}
// This gets covered in local development
// GCOVR_EXCL_START
__attribute__((target("avx512f"))) uint32_t
compare16_32bit_avx512(const InternalVersionT *vs, InternalVersionT rv) {
compare16_avx512(const InternalVersionT *vs, InternalVersionT rv) {
#if USE_64_BIT
int64_t r;
memcpy(&r, &rv, sizeof(r));
uint32_t low =
_mm512_cmpgt_epi64_mask(_mm512_loadu_epi64(vs), _mm512_set1_epi64(r));
uint32_t high =
_mm512_cmpgt_epi64_mask(_mm512_loadu_epi64(vs + 8), _mm512_set1_epi64(r));
return low | (high << 8);
#else
uint32_t r;
memcpy(&r, &rv, sizeof(r));
return _mm512_cmpgt_epi32_mask(
_mm512_sub_epi32(_mm512_loadu_epi32(vs), _mm512_set1_epi32(r)),
_mm512_setzero_epi32());
#endif
}
#endif
// GCOVR_EXCL_STOP
// Returns true if v[i] <= readVersion for all i such that begin <= is[i] < end
// Preconditions: begin <= end, end - begin < 256
@@ -2099,9 +2117,9 @@ bool scan16(const InternalVersionT *vs, const uint8_t *is, int begin, int end,
uint32_t compared = 0;
if constexpr (kAVX512) {
compared = compare16_32bit_avx512(vs, readVersion); // GCOVR_EXCL_LINE
compared = compare16_avx512(vs, readVersion);
} else {
compared = compare16_32bit(vs, readVersion); // GCOVR_EXCL_LINE
compared = compare16(vs, readVersion);
}
return !(compared & mask);
@@ -2160,9 +2178,9 @@ bool scan16(const InternalVersionT *vs, int begin, int end,
#elif defined(HAS_AVX)
uint32_t conflict;
if constexpr (kAVX512) {
conflict = compare16_32bit_avx512(vs, readVersion); // GCOVR_EXCL_LINE
conflict = compare16_avx512(vs, readVersion);
} else {
conflict = compare16_32bit(vs, readVersion); // GCOVR_EXCL_LINE
conflict = compare16(vs, readVersion);
}
conflict &= (1 << end) - 1;
conflict >>= begin;
@@ -2297,12 +2315,9 @@ bool checkMaxBetweenExclusiveImpl(Node *n, int begin, int end,
uint32_t compared = 0;
if constexpr (kAVX512) {
compared = // GCOVR_EXCL_LINE
compare16_32bit_avx512(self->childMaxVersion, // GCOVR_EXCL_LINE
readVersion); // GCOVR_EXCL_LINE
compared = compare16_avx512(self->childMaxVersion, readVersion);
} else {
compared = compare16_32bit(self->childMaxVersion,
readVersion); // GCOVR_EXCL_LINE
compared = compare16(self->childMaxVersion, readVersion);
}
return !(compared & mask) && firstRangeOk;
@@ -2340,10 +2355,8 @@ bool checkMaxBetweenExclusiveImpl(Node *n, int begin, int end,
int c = self->bitSet.firstSetGeq(begin + 1);
if (c >= 0 && c < end) {
auto *child = self->children[self->index[c]];
if (child->entryPresent) {
if (!(child->entry.rangeVersion <= readVersion)) {
return false;
};
if (child->entryPresent && child->entry.rangeVersion > readVersion) {
return false;
}
begin = c;
} else {
@@ -2376,10 +2389,8 @@ bool checkMaxBetweenExclusiveImpl(Node *n, int begin, int end,
int c = self->bitSet.firstSetGeq(begin + 1);
if (c >= 0 && c < end) {
auto *child = self->children[c];
if (child->entryPresent) {
if (!(child->entry.rangeVersion <= readVersion)) {
return false;
};
if (child->entryPresent && child->entry.rangeVersion > readVersion) {
return false;
}
begin = c;
} else {
@@ -2422,9 +2433,7 @@ bool checkMaxBetweenExclusiveImpl(Node *n, int begin, int end,
}
}
// Check inner pages
const int innerPageBegin = (begin >> Node256::kMaxOfMaxShift) + 1;
const int innerPageEnd = (end - 1) >> Node256::kMaxOfMaxShift;
return scan16<kAVX512>(self->maxOfMax, innerPageBegin, innerPageEnd,
return scan16<kAVX512>(self->maxOfMax, firstPage + 1, lastPage,
readVersion);
}
default: // GCOVR_EXCL_LINE
@@ -2433,14 +2442,11 @@ bool checkMaxBetweenExclusiveImpl(Node *n, int begin, int end,
}
#if defined(HAS_AVX) && !defined(__SANITIZE_THREAD__)
// This gets covered in local development
// GCOVR_EXCL_START
__attribute__((target("avx512f"))) bool
checkMaxBetweenExclusive(Node *n, int begin, int end,
InternalVersionT readVersion, ReadContext *tls) {
return checkMaxBetweenExclusiveImpl<true>(n, begin, end, readVersion, tls);
}
// GCOVR_EXCL_STOP
__attribute__((target("default")))
#endif
bool checkMaxBetweenExclusive(Node *n, int begin, int end,
@@ -2906,9 +2912,8 @@ checkMaxBetweenExclusiveImpl<true>(Node *n, int begin, int end,
// of the result will have `maxVersion` set to `writeVersion` as a
// postcondition. Nodes along the search path may be invalidated. Callers must
// ensure that the max version of the self argument is updated.
[[nodiscard]]
Node **insert(Node **self, std::span<const uint8_t> key,
InternalVersionT writeVersion, WriteContext *tls) {
[[nodiscard]] Node **insert(Node **self, std::span<const uint8_t> key,
InternalVersionT writeVersion, WriteContext *tls) {
for (; key.size() != 0; ++tls->accum.insert_iterations) {
self = &getOrCreateChild(*self, key, writeVersion, tls);
@@ -2927,7 +2932,6 @@ void eraseTree(Node *root, WriteContext *tls) {
tls->accum.entries_erased += n->entryPresent;
++tls->accum.nodes_released;
removeNode(n);
removeKey(n);
switch (n->getType()) {
@@ -3144,6 +3148,8 @@ struct __attribute__((visibility("hidden"))) ConflictSet::Impl {
tls.impl = this;
int64_t check_byte_accum = 0;
for (int i = 0; i < count; ++i) {
assert(reads[i].readVersion >= 0);
assert(reads[i].readVersion <= newestVersionFullPrecision);
const auto &r = reads[i];
check_byte_accum += r.begin.len + r.end.len;
auto begin = std::span<const uint8_t>(r.begin.p, r.begin.len);
@@ -3180,10 +3186,12 @@ struct __attribute__((visibility("hidden"))) ConflictSet::Impl {
}
void addWrites(const WriteRange *writes, int count, int64_t writeVersion) {
#if !USE_64_BIT
// There could be other conflict sets in the same thread. We need
// InternalVersionT::zero to be correct for this conflict set for the
// lifetime of the current call frame.
InternalVersionT::zero = tls.zero = oldestVersion;
#endif
assert(writeVersion >= newestVersionFullPrecision);
assert(tls.accum.entries_erased == 0);
@@ -3198,7 +3206,10 @@ struct __attribute__((visibility("hidden"))) ConflictSet::Impl {
newestVersionFullPrecision = writeVersion;
newest_version.set(newestVersionFullPrecision);
setOldestVersion(newestVersionFullPrecision - kNominalVersionWindow);
if (newestVersionFullPrecision - kNominalVersionWindow >
oldestVersionFullPrecision) {
setOldestVersion(newestVersionFullPrecision - kNominalVersionWindow);
}
while (oldestExtantVersion <
newestVersionFullPrecision - kMaxCorrectVersionWindow) {
gcScanStep(1000);
@@ -3206,7 +3217,10 @@ struct __attribute__((visibility("hidden"))) ConflictSet::Impl {
} else {
newestVersionFullPrecision = writeVersion;
newest_version.set(newestVersionFullPrecision);
setOldestVersion(newestVersionFullPrecision - kNominalVersionWindow);
if (newestVersionFullPrecision - kNominalVersionWindow >
oldestVersionFullPrecision) {
setOldestVersion(newestVersionFullPrecision - kNominalVersionWindow);
}
}
for (int i = 0; i < count; ++i) {
@@ -3291,14 +3305,22 @@ struct __attribute__((visibility("hidden"))) ConflictSet::Impl {
return fuel;
}
void setOldestVersion(int64_t o) {
if (o <= oldestVersionFullPrecision) {
void setOldestVersion(int64_t newOldestVersion) {
assert(newOldestVersion >= 0);
assert(newOldestVersion <= newestVersionFullPrecision);
// If addWrites advances oldestVersion to keep within valid window, a
// subsequent setOldestVersion can be legitimately called with a version
// older than `oldestVersionFullPrecision`. < instead of <= so that we can
// do garbage collection work without advancing the oldest version.
if (newOldestVersion < oldestVersionFullPrecision) {
return;
}
InternalVersionT oldestVersion{o};
this->oldestVersionFullPrecision = o;
InternalVersionT oldestVersion{newOldestVersion};
this->oldestVersionFullPrecision = newOldestVersion;
this->oldestVersion = oldestVersion;
#if !USE_64_BIT
InternalVersionT::zero = tls.zero = oldestVersion;
#endif
#ifdef NDEBUG
// This is here for performance reasons, since we want to amortize the cost
// of storing the search path as a string. In tests, we want to exercise the
@@ -3347,12 +3369,15 @@ struct __attribute__((visibility("hidden"))) ConflictSet::Impl {
root->entry.pointVersion = this->oldestVersion;
root->entry.rangeVersion = this->oldestVersion;
#if !USE_64_BIT
InternalVersionT::zero = tls.zero = this->oldestVersion;
#endif
// Intentionally not resetting totalBytes
}
explicit Impl(int64_t oldestVersion) {
assert(oldestVersion >= 0);
init(oldestVersion);
initMetrics();
}
@@ -3747,13 +3772,13 @@ std::string getSearchPath(Node *n) {
fprintf(file,
" k_%p [label=\"m=%" PRId64 " p=%" PRId64 " r=%" PRId64
"\n%s\", pos=\"%d,%d!\"];\n",
(void *)n, maxVersion(n).toInt64(),
(void *)n, n->parent == nullptr ? -1 : maxVersion(n).toInt64(),
n->entry.pointVersion.toInt64(),
n->entry.rangeVersion.toInt64(),
getPartialKeyPrintable(n).c_str(), x, y);
} else {
fprintf(file, " k_%p [label=\"m=%" PRId64 "\n%s\", pos=\"%d,%d!\"];\n",
(void *)n, maxVersion(n).toInt64(),
(void *)n, n->parent == nullptr ? -1 : maxVersion(n).toInt64(),
getPartialKeyPrintable(n).c_str(), x, y);
}
x += kSeparation;
@@ -3794,6 +3819,9 @@ Node *firstGeq(Node *n, std::string_view key) {
n, std::span<const uint8_t>((const uint8_t *)key.data(), key.size()));
}
#if USE_64_BIT
void checkVersionsGeqOldestExtant(Node *, InternalVersionT) {}
#else
void checkVersionsGeqOldestExtant(Node *n,
InternalVersionT oldestExtantVersion) {
if (n->entryPresent) {
@@ -3837,6 +3865,7 @@ void checkVersionsGeqOldestExtant(Node *n,
abort();
}
}
#endif
[[maybe_unused]] InternalVersionT
checkMaxVersion(Node *root, Node *node, InternalVersionT oldestVersion,

View File

@@ -20,7 +20,6 @@ using namespace weaselab;
#include <thread>
#include <unordered_set>
#include <utility>
#include <vector>
#include <callgrind.h>
@@ -748,7 +747,10 @@ struct TestDriver {
fprintf(stderr, "%p Set oldest version: %" PRId64 "\n", this,
oldestVersion);
#endif
CALLGRIND_START_INSTRUMENTATION;
cs.setOldestVersion(oldestVersion);
CALLGRIND_STOP_INSTRUMENTATION;
if constexpr (kEnableAssertions) {
refImpl.setOldestVersion(oldestVersion);
}

11
Jenkinsfile vendored
View File

@@ -48,6 +48,17 @@ pipeline {
recordIssues(tools: [clang()])
}
}
stage('64 bit versions') {
agent {
dockerfile {
args '-v /home/jenkins/ccache:/ccache'
reuseNode true
}
}
steps {
CleanBuildAndTest("-DCMAKE_CXX_FLAGS=-DUSE_64_BIT=1")
}
}
stage('Debug') {
agent {
dockerfile {

View File

@@ -24,15 +24,15 @@ Hardware for all benchmarks is an AMD Ryzen 9 7900 with (2x32GB) 5600MT/s CL28-3
| ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 11.04 | 90,614,308.12 | 0.8% | 180.38 | 55.13 | 3.272 | 41.51 | 0.4% | 0.01 | `point reads`
| 14.96 | 66,843,629.12 | 0.4% | 274.41 | 74.73 | 3.672 | 55.05 | 0.3% | 0.01 | `prefix reads`
| 37.06 | 26,982,847.61 | 0.2% | 791.04 | 185.28 | 4.269 | 142.67 | 0.2% | 0.01 | `range reads`
| 17.89 | 55,887,365.73 | 0.6% | 335.54 | 89.79 | 3.737 | 43.84 | 0.4% | 0.01 | `point writes`
| 31.85 | 31,394,336.65 | 0.3% | 615.32 | 159.63 | 3.855 | 87.69 | 0.2% | 0.01 | `prefix writes`
| 36.17 | 27,647,221.45 | 0.6% | 705.11 | 182.80 | 3.857 | 100.62 | 0.1% | 0.01 | `range writes`
| 79.01 | 12,656,457.78 | 0.7% | 1,498.35 | 402.46 | 3.723 | 270.50 | 0.1% | 0.01 | `monotonic increasing point writes`
| 303,667.50 | 3,293.08 | 1.1% | 3,931,273.00 | 1,612,702.50 | 2.438 | 806,223.33 | 0.0% | 0.01 | `worst case for radix tree`
| 83.70 | 11,947,443.83 | 0.7% | 1,738.03 | 429.06 | 4.051 | 270.01 | 0.0% | 0.01 | `create and destroy`
| 11.18 | 89,455,125.34 | 0.6% | 185.37 | 57.08 | 3.248 | 41.51 | 0.4% | 0.01 | `point reads`
| 14.53 | 68,800,688.89 | 0.4% | 282.41 | 74.80 | 3.776 | 55.06 | 0.3% | 0.01 | `prefix reads`
| 36.54 | 27,367,576.87 | 0.2% | 798.06 | 188.90 | 4.225 | 141.69 | 0.2% | 0.01 | `range reads`
| 16.69 | 59,912,106.02 | 0.6% | 314.57 | 86.29 | 3.645 | 39.84 | 0.4% | 0.01 | `point writes`
| 30.09 | 33,235,744.07 | 0.5% | 591.33 | 155.92 | 3.793 | 82.69 | 0.2% | 0.01 | `prefix writes`
| 35.77 | 27,956,388.03 | 1.4% | 682.25 | 187.63 | 3.636 | 96.12 | 0.1% | 0.01 | `range writes`
| 74.04 | 13,505,408.41 | 2.7% | 1,448.95 | 392.10 | 3.695 | 260.53 | 0.1% | 0.01 | `monotonic increasing point writes`
| 330,984.50 | 3,021.29 | 1.9% | 3,994,153.50 | 1,667,309.00 | 2.396 | 806,019.50 | 0.0% | 0.01 | `worst case for radix tree`
| 92.46 | 10,814,961.65 | 0.5% | 1,800.00 | 463.41 | 3.884 | 297.00 | 0.0% | 0.01 | `create and destroy`
# "Real data" test
@@ -47,7 +47,7 @@ Check: 4.47891 seconds, 364.05 MB/s, Add: 4.55599 seconds, 123.058 MB/s, Gc rati
## radix tree
```
Check: 0.958985 seconds, 1700.28 MB/s, Add: 1.35083 seconds, 415.044 MB/s, Gc ratio: 44.4768%, Peak idle memory: 2.33588e+06
Check: 0.953012 seconds, 1710.94 MB/s, Add: 1.30025 seconds, 431.188 MB/s, Gc ratio: 43.9816%, Peak idle memory: 2.28375e+06
```
## hash table

View File

@@ -24,8 +24,8 @@ std::atomic<int64_t> transactions;
constexpr int kBaseSearchDepth = 32;
constexpr int kWindowSize = 10000000;
std::basic_string<uint8_t> numToKey(int64_t num) {
std::basic_string<uint8_t> result;
std::string numToKey(int64_t num) {
std::string result;
result.resize(kBaseSearchDepth + sizeof(int64_t));
memset(result.data(), 0, kBaseSearchDepth);
int64_t be = __builtin_bswap64(num);
@@ -45,13 +45,13 @@ void workload(weaselab::ConflictSet *cs) {
auto pointK = numToKey(pointRv);
weaselab::ConflictSet::ReadRange reads[] = {
{
{pointK.data(), int(pointK.size())},
{(const uint8_t *)pointK.data(), int(pointK.size())},
{nullptr, 0},
pointRv,
},
{
{beginK.data(), int(beginK.size())},
{endK.data(), int(endK.size())},
{(const uint8_t *)beginK.data(), int(beginK.size())},
{(const uint8_t *)endK.data(), int(endK.size())},
version - 2,
},
};
@@ -70,7 +70,7 @@ void workload(weaselab::ConflictSet *cs) {
{
weaselab::ConflictSet::WriteRange w;
auto k = numToKey(version);
w.begin.p = k.data();
w.begin.p = (const uint8_t *)k.data();
w.end.len = 0;
if (version % (kWindowSize / 2) == 0) {
for (int l = 0; l <= k.size(); ++l) {
@@ -83,9 +83,9 @@ void workload(weaselab::ConflictSet *cs) {
int64_t beginN = version - kWindowSize + rand() % kWindowSize;
auto b = numToKey(beginN);
auto e = numToKey(beginN + 1000);
w.begin.p = b.data();
w.begin.p = (const uint8_t *)b.data();
w.begin.len = b.size();
w.end.p = e.data();
w.end.p = (const uint8_t *)e.data();
w.end.len = e.size();
cs->addWrites(&w, 1, version);
}

View File

@@ -25,6 +25,7 @@
#include <algorithm>
#include <span>
#include <vector>
std::span<const uint8_t> keyAfter(Arena &arena, std::span<const uint8_t> key) {
auto result =
@@ -115,15 +116,6 @@ bool operator==(const KeyInfo &lhs, const KeyInfo &rhs) {
return !(lhs < rhs || rhs < lhs);
}
void swapSort(std::vector<KeyInfo> &points, int a, int b) {
if (points[b] < points[a]) {
KeyInfo temp;
temp = points[a];
points[a] = points[b];
points[b] = temp;
}
}
struct SortTask {
int begin;
int size;
@@ -183,13 +175,6 @@ void sortPoints(std::vector<KeyInfo> &points) {
}
}
static thread_local uint32_t g_seed = 0;
static inline int skfastrand() {
g_seed = g_seed * 1664525L + 1013904223L;
return g_seed;
}
static int compare(const StringRef &a, const StringRef &b) {
int c = memcmp(a.data(), b.data(), std::min(a.size(), b.size()));
if (c < 0)
@@ -215,20 +200,24 @@ struct ReadConflictRange {
}
};
static constexpr int MaxLevels = 26;
struct RandomLevel {
explicit RandomLevel(uint32_t seed) : seed(seed) {}
int next() {
int result = __builtin_clz(seed | (uint32_t(-1) >> (MaxLevels - 1)));
seed = seed * 1664525L + 1013904223L;
return result;
}
private:
uint32_t seed;
};
class SkipList {
private:
static constexpr int MaxLevels = 26;
int randomLevel() const {
uint32_t i = uint32_t(skfastrand()) >> (32 - (MaxLevels - 1));
int level = 0;
while (i & 1) {
i >>= 1;
level++;
}
assert(level < MaxLevels);
return level;
}
RandomLevel randomLevel{0};
// Represent a node in the SkipList. The node has multiple (i.e., level)
// pointers to other nodes, and keeps a record of the max versions for each
@@ -426,18 +415,23 @@ public:
}
void swap(SkipList &other) { std::swap(header, other.header); }
void addConflictRanges(const Finger *fingers, int rangeCount,
Version version) {
// Returns the change in the number of entries
int64_t addConflictRanges(const Finger *fingers, int rangeCount,
Version version) {
int64_t result = rangeCount;
for (int r = rangeCount - 1; r >= 0; r--) {
const Finger &startF = fingers[r * 2];
const Finger &endF = fingers[r * 2 + 1];
if (endF.found() == nullptr)
if (endF.found() == nullptr) {
++result;
insert(endF, endF.finger[0]->getMaxVersion(0));
}
remove(startF, endF);
result -= remove(startF, endF);
insert(startF, version);
}
return result;
}
void detectConflicts(ReadConflictRange *ranges, int count,
@@ -567,9 +561,10 @@ public:
}
private:
void remove(const Finger &start, const Finger &end) {
// Returns the number of entries removed
int64_t remove(const Finger &start, const Finger &end) {
if (start.finger[0] == end.finger[0])
return;
return 0;
Node *x = start.finger[0]->getNext(0);
@@ -578,17 +573,20 @@ private:
if (start.finger[i] != end.finger[i])
start.finger[i]->setNext(i, end.finger[i]->getNext(i));
int64_t result = 0;
while (true) {
Node *next = x->getNext(0);
x->destroy();
++result;
if (x == end.finger[0])
break;
x = next;
}
return result;
}
void insert(const Finger &f, Version version) {
int level = randomLevel();
int level = randomLevel.next();
// std::cout << std::string((const char*)value,length) << " level: " <<
// level << std::endl;
Node *x = Node::create(f.value, level);
@@ -704,8 +702,6 @@ private:
};
};
struct SkipListConflictSet {};
struct __attribute__((visibility("hidden"))) ConflictSet::Impl {
Impl(int64_t oldestVersion)
: oldestVersion(oldestVersion), newestVersion(oldestVersion),
@@ -775,17 +771,20 @@ struct __attribute__((visibility("hidden"))) ConflictSet::Impl {
StringRef values[stripeSize];
int64_t writeVersions[stripeSize / 2];
int ss = stringCount - (stripes - 1) * stripeSize;
int64_t entryDelta = 0;
for (int s = stripes - 1; s >= 0; s--) {
for (int i = 0; i * 2 < ss; ++i) {
const auto &w = combinedWriteConflictRanges[s * stripeSize / 2 + i];
values[i * 2] = w.first;
values[i * 2 + 1] = w.second;
keyUpdates += 3;
}
skipList.find(values, fingers, temp, ss);
skipList.addConflictRanges(fingers, ss / 2, writeVersion);
entryDelta += skipList.addConflictRanges(fingers, ss / 2, writeVersion);
ss = stripeSize;
}
// Run gc at least 200% the rate we're inserting entries
keyUpdates += std::max<int64_t>(entryDelta, 0) * 2;
}
void setOldestVersion(int64_t oldestVersion) {
@@ -795,7 +794,7 @@ struct __attribute__((visibility("hidden"))) ConflictSet::Impl {
int temp;
std::span<const uint8_t> key = removalKey;
skipList.find(&key, &finger, &temp, 1);
skipList.removeBefore(oldestVersion, finger, std::exchange(keyUpdates, 10));
skipList.removeBefore(oldestVersion, finger, std::exchange(keyUpdates, 0));
removalArena = Arena();
removalKey = copyToArena(
removalArena, {finger.getValue().data(), finger.getValue().size()});
@@ -804,7 +803,7 @@ struct __attribute__((visibility("hidden"))) ConflictSet::Impl {
int64_t totalBytes = 0;
private:
int64_t keyUpdates = 10;
int64_t keyUpdates = 0;
Arena removalArena;
std::span<const uint8_t> removalKey;
int64_t oldestVersion;

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

View File

@@ -54,8 +54,9 @@ struct __attribute__((__visibility__("default"))) ConflictSet {
/** `end` having length 0 denotes that this range is the single key {begin}.
* Otherwise this denotes the range [begin, end), and begin must be < end */
Key end;
/** `readVersion` older than the the oldestVersion or the version of the
* latest call to `addWrites` minus two billion will result in `TooOld` */
/** `readVersion` older than the oldestVersion or the version of the
* latest call to `addWrites` minus two billion will result in `TooOld`.
* Must be <= the version of the latest call to `addWrites` */
int64_t readVersion;
};
@@ -72,11 +73,13 @@ struct __attribute__((__visibility__("default"))) ConflictSet {
/** Reads intersecting writes where readVersion < `writeVersion` will result
* in `Conflict` (or `TooOld`, eventually). `writeVersion` must be greater
* than or equal to all previous write versions. */
* than or equal to all previous write versions. Call `addWrites` with `count`
* zero to only advance the version. */
void addWrites(const WriteRange *writes, int count, int64_t writeVersion);
/** Reads where readVersion < oldestVersion will result in `TooOld`. Must be
* greater than or equal to all previous oldest versions. */
/** Reads where readVersion < `oldestVersion` will result in `TooOld`. Must be
* greater than or equal to all previous oldest versions. Must be <= the
* version of the latest call to `addWrites` */
void setOldestVersion(int64_t oldestVersion);
/** Reads where readVersion < oldestVersion will result in `TooOld`. There are
@@ -170,8 +173,9 @@ typedef struct {
/** `end` having length 0 denotes that this range is the single key {begin}.
* Otherwise this denotes the range [begin, end), and begin must be < end */
ConflictSet_Key end;
/** `readVersion` older than the the oldestVersion or the version of the
* latest call to `addWrites` minus two billion will result in `TooOld` */
/** `readVersion` older than the oldestVersion or the version of the
* latest call to `addWrites` minus two billion will result in `TooOld`.
* Must be <= the version of the latest call to `addWrites` */
int64_t readVersion;
} ConflictSet_ReadRange;
@@ -188,15 +192,17 @@ void ConflictSet_check(const ConflictSet *cs,
const ConflictSet_ReadRange *reads,
ConflictSet_Result *results, int count);
/** Reads intersecting writes where readVersion < `writeVersion` will result in
* `Conflict` (or `TooOld`, eventually). `writeVersion` must be greater than or
* equal to all previous write versions. */
/** Reads intersecting writes where readVersion < `writeVersion` will result
* in `Conflict` (or `TooOld`, eventually). `writeVersion` must be greater
* than or equal to all previous write versions. Call `addWrites` with `count`
* zero to only advance the version. */
void ConflictSet_addWrites(ConflictSet *cs,
const ConflictSet_WriteRange *writes, int count,
int64_t writeVersion);
/** Reads where readVersion < oldestVersion will result in `TooOld`. Must be
* greater than or equal to all previous oldest versions. */
/** Reads where readVersion < `oldestVersion` will result in `TooOld`. Must be
* greater than or equal to all previous oldest versions. Must be <= the
* version of the latest call to `addWrites` */
void ConflictSet_setOldestVersion(ConflictSet *cs, int64_t oldestVersion);
/** Reads where readVersion < oldestVersion will result in `TooOld`. There are

View File

@@ -206,8 +206,11 @@ until we end at $a_{i} + 1$, adjacent to the first inner range.
A few notes on implementation:
\begin{itemize}
\item{For clarity, the above algorithm decouples the logical partitioning from the physical structure of the tree. An optimized implementation would merge adjacent prefix ranges that don't correspond to nodes in the tree as it scans, so that it only calculates the version of such merged ranges once. Additionally, our implementation stores an index of which child pointers are valid as a bitset for Node48 and Node256 to speed up this scan using techniques inspired by \cite{Lemire_2018}.}
\item{In order to avoid many costly pointer indirections, we can store the max version not in each node itself but next to each node's parent pointer. Without this, the range read performance is not competetive with the skip list.}
\item{For clarity, the above algorithm decouples the logical partitioning from the physical structure of the tree.
An optimized implementation would merge adjacent prefix ranges that don't correspond to nodes in the tree as it scans, so that it only calculates the version of such merged ranges once.
Additionally, our implementation uses SIMD instructions and instruction-level parallelism to compare many prefix ranges to the read version $r$ in parallel.}
\item{In order to avoid many costly pointer indirections, and to take advantage of SIMD, we can store the max version of child nodes as a dense array directly in the parent node.
Without this, the range read performance is not competetive with the skip list.}
\item{An optimized implementation would visit the partition of $[a_{i}\dots a_{m}, a_{i} + 1)$ in reverse order, as it descends along the search path to $a_{i}\dots a_{m}$}
\item{An optimized implementation would search for the common prefix first, and return early if any prefix of the common prefix has a $max \leq r$.}
\end{itemize}

View File

@@ -96,6 +96,7 @@ def test_inner_full_words():
def test_internal_version_zero():
with DebugConflictSet() as cs:
cs.addWrites(0xFFFFFFF0)
cs.setOldestVersion(0xFFFFFFF0)
for i in range(24):
cs.addWrites(0xFFFFFFF1, write(bytes([i])))