24 Commits

Author SHA1 Message Date
311794c37e Update GCOVR annotations now that jenkins agent has avx512
All checks were successful
Tests / Clang total: 2710, passed: 2710
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Debug total: 2708, passed: 2708
Tests / SIMD fallback total: 2710, passed: 2710
Tests / Release [gcc] total: 2710, passed: 2710
GNU C Compiler (gcc) |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Release [gcc,aarch64] total: 2020, passed: 2020
Tests / Coverage total: 2036, passed: 2036
Code Coverage #### Project Overview No changes detected, that affect the code coverage. * Line Coverage: 98.97% (1821/1840) * Branch Coverage: 67.00% (1478/2206) * Complexity Density: 0.00 * Lines of Code: 1840 #### Quality Gates Summary Output truncated.
weaselab/conflict-set/pipeline/head This commit looks good
2024-08-20 16:34:27 -07:00
dfa178ba19 Add to corpus (from fuzzing on macos)
Some checks failed
Tests / Clang total: 2710, passed: 2710
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Debug total: 2708, passed: 2708
Tests / SIMD fallback total: 2710, passed: 2710
Tests / Release [gcc] total: 2710, passed: 2710
GNU C Compiler (gcc) |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Release [gcc,aarch64] total: 2020, passed: 2020
Tests / Coverage total: 2036, passed: 2036
Code Coverage #### Project Overview No changes detected, that affect the code coverage. * Line Coverage: 98.85% (1807/1828) * Branch Coverage: 67.11% (1475/2198) * Complexity Density: 0.00 * Lines of Code: 1828 #### Quality Gates Summary Output truncated.
weaselab/conflict-set/pipeline/head There was a failure building this commit
2024-08-20 13:32:04 -07:00
a16d18edfe Update README.md
All checks were successful
Tests / Clang total: 2626, passed: 2626
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Debug total: 2624, passed: 2624
Tests / SIMD fallback total: 2626, passed: 2626
Tests / Release [gcc] total: 2626, passed: 2626
GNU C Compiler (gcc) |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Release [gcc,aarch64] total: 1957, passed: 1957
Tests / Coverage total: 1973, passed: 1973
Code Coverage #### Project Overview No changes detected, that affect the code coverage. * Line Coverage: 99.29% (1815/1828) * Branch Coverage: 67.47% (1483/2198) * Complexity Density: 0.00 * Lines of Code: 1828 #### Quality Gates Summary Output truncated.
weaselab/conflict-set/pipeline/head This commit looks good
2024-08-20 12:03:27 -07:00
2b60287448 Partition valgrind into tests of size at most 100
All checks were successful
Tests / Clang total: 2626, passed: 2626
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Debug total: 2624, passed: 2624
Tests / SIMD fallback total: 2626, passed: 2626
Tests / Release [gcc] total: 2626, passed: 2626
GNU C Compiler (gcc) |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Release [gcc,aarch64] total: 1957, passed: 1957
Tests / Coverage total: 1973, passed: 1973
Code Coverage #### Project Overview No changes detected, that affect the code coverage. * Line Coverage: 99.29% (1815/1828) * Branch Coverage: 67.47% (1483/2198) * Complexity Density: 0.00 * Lines of Code: 1828 #### Quality Gates Summary Output truncated.
weaselab/conflict-set/pipeline/head This commit looks good
2024-08-20 11:35:20 -07:00
0a9ac59676 Commit to non-simd Node3 implementations
Some checks failed
Tests / Clang total: 2620, passed: 2620
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Debug total: 2618, passed: 2618
Tests / SIMD fallback total: 2620, passed: 2620
Tests / Release [gcc] total: 2620, passed: 2620
GNU C Compiler (gcc) |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Release [gcc,aarch64] total: 1957, passed: 1957
Tests / Coverage total: 1967, failed: 1, passed: 1966
weaselab/conflict-set/pipeline/head There was a failure building this commit
2024-08-20 10:36:04 -07:00
e3a77ed773 Remove unnecessary casts 2024-08-20 10:30:27 -07:00
cdf9a8a7b0 Save 8 bytes in Node3 2024-08-20 10:30:07 -07:00
305dfdd52f Change whitespace in node structs for consistency 2024-08-20 09:57:44 -07:00
7261c91492 Remove Node48::nextFree, and improve padding to save 8 bytes 2024-08-20 09:51:29 -07:00
f11720f5ae Add to corpus
Some checks reported errors
Tests / Clang total: 2620, passed: 2620
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Debug total: 2618, passed: 2618
Tests / SIMD fallback total: 2620, passed: 2620
Tests / Release [gcc] total: 2620, passed: 2620
GNU C Compiler (gcc) |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Release [gcc,aarch64] total: 1957, passed: 1957
Tests / Coverage total: 1967, failed: 1, passed: 1966
weaselab/conflict-set/pipeline/head Something is wrong with the build of this commit
2024-08-19 21:52:47 -07:00
e2b7298af5 Instrument setOldestVersion for callgrind too 2024-08-19 21:32:04 -07:00
8e1e344f4b Fix clang-18 warning about std::basic_string<uint8_t>
All checks were successful
Tests / Clang total: 2500, passed: 2500
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Debug total: 2498, passed: 2498
Tests / SIMD fallback total: 2500, passed: 2500
Tests / Release [gcc] total: 2500, passed: 2500
GNU C Compiler (gcc) |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Release [gcc,aarch64] total: 1867, passed: 1867
Tests / Coverage total: 1877, passed: 1877
Code Coverage #### Project Overview No changes detected, that affect the code coverage. * Line Coverage: 99.29% (1824/1837) * Branch Coverage: 67.44% (1485/2202) * Complexity Density: 0.00 * Lines of Code: 1837 #### Quality Gates Summary Output truncated.
weaselab/conflict-set/pipeline/head This commit looks good
2024-08-17 16:40:32 -07:00
3634b6a59b Simplify slightly in checkMaxBetweenExclusive
Some checks failed
Tests / Clang total: 2500, passed: 2500
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
weaselab/conflict-set/pipeline/head There was a failure building this commit
2024-08-17 14:34:13 -07:00
a3cc14c807 Fix double counting of remove node in showMemory mode 2024-08-17 13:43:47 -07:00
55b3275434 ifdef linux specific stuff
All checks were successful
Tests / Clang total: 2500, passed: 2500
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Debug total: 2498, passed: 2498
Tests / SIMD fallback total: 2500, passed: 2500
Tests / Release [gcc] total: 2500, passed: 2500
GNU C Compiler (gcc) |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Release [gcc,aarch64] total: 1867, passed: 1867
Tests / Coverage total: 1877, passed: 1877
Code Coverage #### Project Overview No changes detected, that affect the code coverage. * Line Coverage: 99.29% (1829/1842) * Branch Coverage: 67.35% (1479/2196) * Complexity Density: 0.00 * Lines of Code: 1842 #### Quality Gates Summary Output truncated.
weaselab/conflict-set/pipeline/head This commit looks good
2024-08-16 16:23:46 -07:00
3a5b86ed9e Ignore itlb in grafana instead
All checks were successful
Tests / Clang total: 2500, passed: 2500
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Debug total: 2498, passed: 2498
Tests / SIMD fallback total: 2500, passed: 2500
Tests / Release [gcc] total: 2500, passed: 2500
GNU C Compiler (gcc) |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Release [gcc,aarch64] total: 1867, passed: 1867
Tests / Coverage total: 1877, passed: 1877
Code Coverage #### Project Overview No changes detected, that affect the code coverage. * Line Coverage: 99.29% (1829/1842) * Branch Coverage: 67.35% (1479/2196) * Complexity Density: 0.00 * Lines of Code: 1842 #### Quality Gates Summary Output truncated.
weaselab/conflict-set/pipeline/head This commit looks good
2024-08-16 16:02:01 -07:00
159f2eef74 Use group leader fd
Events in the same group should be associated with the same set of
instructions
2024-08-16 15:35:35 -07:00
2952abe811 Reorg headers and only print unexpected errno's 2024-08-16 14:25:41 -07:00
ce54746a4a Add several new cache events to metrics
All checks were successful
Tests / Clang total: 2500, passed: 2500
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Debug total: 2498, passed: 2498
Tests / SIMD fallback total: 2500, passed: 2500
Tests / Release [gcc] total: 2500, passed: 2500
GNU C Compiler (gcc) |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Release [gcc,aarch64] total: 1867, passed: 1867
Tests / Coverage total: 1877, passed: 1877
Code Coverage #### Project Overview No changes detected, that affect the code coverage. * Line Coverage: 99.29% (1829/1842) * Branch Coverage: 67.35% (1479/2196) * Complexity Density: 0.00 * Lines of Code: 1842 #### Quality Gates Summary Output truncated.
weaselab/conflict-set/pipeline/head This commit looks good
2024-08-16 12:48:22 -07:00
b15959d62c Add more perf counters 2024-08-16 10:57:00 -07:00
b009de1c2b Avoid branching on type twice in erase
All checks were successful
Tests / Clang total: 2500, passed: 2500
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Debug total: 2498, passed: 2498
Tests / SIMD fallback total: 2500, passed: 2500
Tests / Release [gcc] total: 2500, passed: 2500
GNU C Compiler (gcc) |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Release [gcc,aarch64] total: 1867, passed: 1867
Tests / Coverage total: 1877, passed: 1877
Code Coverage #### Project Overview No changes detected, that affect the code coverage. * Line Coverage: 99.29% (1829/1842) * Branch Coverage: 67.35% (1479/2196) * Complexity Density: 0.00 * Lines of Code: 1842 #### Quality Gates Summary Output truncated.
weaselab/conflict-set/pipeline/head This commit looks good
2024-08-16 09:29:22 -07:00
55a230c75e Remove dontInvalidate arg from erase
Use a new node member "endOfRange" instead
2024-08-16 09:08:56 -07:00
0711ec3831 Remove dead code
All checks were successful
Tests / Clang total: 2500, passed: 2500
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Debug total: 2498, passed: 2498
Tests / SIMD fallback total: 2500, passed: 2500
Tests / Release [gcc] total: 2500, passed: 2500
GNU C Compiler (gcc) |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Release [gcc,aarch64] total: 1867, passed: 1867
Tests / Coverage total: 1877, passed: 1877
Code Coverage #### Project Overview No changes detected, that affect the code coverage. * Line Coverage: 99.29% (1815/1828) * Branch Coverage: 67.41% (1479/2194) * Complexity Density: 0.00 * Lines of Code: 1828 #### Quality Gates Summary Output truncated.
weaselab/conflict-set/pipeline/head This commit looks good
2024-08-15 20:19:00 -07:00
0280bd77e5 Skip "dontInvalidate" check in erase from gc
Some checks failed
Tests / Clang total: 2500, passed: 2500
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Debug total: 2498, passed: 2498
Tests / SIMD fallback total: 2500, passed: 2500
Tests / Release [gcc] total: 2500, passed: 2500
GNU C Compiler (gcc) |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Release [gcc,aarch64] total: 1867, passed: 1867
Tests / Coverage total: 1877, passed: 1877
Code Coverage #### Project Overview No changes detected, that affect the code coverage. * Line Coverage: 98.64% (1815/1840) * Branch Coverage: 67.14% (1479/2203) * Complexity Density: 0.00 * Lines of Code: 1840 #### Quality Gates Summary Output truncated.
weaselab/conflict-set/pipeline/head There was a failure building this commit
2024-08-15 19:07:53 -07:00
56 changed files with 258 additions and 230 deletions

View File

@@ -276,9 +276,15 @@ if(CMAKE_SOURCE_DIR STREQUAL CMAKE_CURRENT_SOURCE_DIR AND BUILD_TESTING)
find_program(VALGRIND_EXE valgrind) find_program(VALGRIND_EXE valgrind)
if(VALGRIND_EXE AND NOT CMAKE_CROSSCOMPILING) if(VALGRIND_EXE AND NOT CMAKE_CROSSCOMPILING)
add_test(NAME conflict_set_blackbox_valgrind list(LENGTH CORPUS_TESTS len)
COMMAND ${VALGRIND_EXE} --error-exitcode=99 -- math(EXPR last "${len} - 1")
$<TARGET_FILE:driver> ${CORPUS_TESTS}) set(partition_size 100)
foreach(i RANGE 0 ${last} ${partition_size})
list(SUBLIST CORPUS_TESTS ${i} ${partition_size} partition)
add_test(NAME conflict_set_blackbox_valgrind_${i}
COMMAND ${VALGRIND_EXE} --error-exitcode=99 --
$<TARGET_FILE:driver> ${partition})
endforeach()
endif() endif()
# api smoke tests # api smoke tests

View File

@@ -173,7 +173,7 @@ int BitSet::firstSetGeq(int i) const {
return -1; return -1;
} }
enum Type { enum Type : int8_t {
Type_Node0, Type_Node0,
Type_Node3, Type_Node3,
Type_Node16, Type_Node16,
@@ -191,12 +191,12 @@ struct Node {
int32_t partialKeyLen; int32_t partialKeyLen;
int16_t numChildren; int16_t numChildren;
bool entryPresent; bool entryPresent;
// Temp variable used to signal the end of the range during addWriteRange
bool endOfRange;
uint8_t parentsIndex; uint8_t parentsIndex;
/* end section that's copied to the next node */ /* end section that's copied to the next node */
uint8_t *partialKey(); uint8_t *partialKey();
size_t size() const;
Type getType() const { return type; } Type getType() const { return type; }
int32_t getCapacity() const { return partialKeyCapacity; } int32_t getCapacity() const { return partialKeyCapacity; }
@@ -220,84 +220,83 @@ constexpr int kNodeCopySize =
struct Node0 : Node { struct Node0 : Node {
constexpr static auto kType = Type_Node0; constexpr static auto kType = Type_Node0;
uint8_t *partialKey() { return (uint8_t *)(this + 1); }
uint8_t *partialKey() { return (uint8_t *)(this + 1); }
void copyChildrenAndKeyFrom(const Node0 &other); void copyChildrenAndKeyFrom(const Node0 &other);
void copyChildrenAndKeyFrom(const struct Node3 &other); void copyChildrenAndKeyFrom(const struct Node3 &other);
size_t size() const { return sizeof(Node0) + getCapacity(); } size_t size() const { return sizeof(Node0) + getCapacity(); }
}; };
struct Node3 : Node { struct Node3 : Node {
constexpr static auto kMaxNodes = 3; constexpr static auto kMaxNodes = 3;
constexpr static auto kType = Type_Node3; constexpr static auto kType = Type_Node3;
// Sorted
uint8_t index[kMaxNodes];
Node *children[kMaxNodes]; Node *children[kMaxNodes];
InternalVersionT childMaxVersion[kMaxNodes]; InternalVersionT childMaxVersion[kMaxNodes];
uint8_t *partialKey() { return (uint8_t *)(this + 1); } // Sorted
uint8_t index[kMaxNodes];
uint8_t *partialKey() { return (uint8_t *)(this + 1); }
void copyChildrenAndKeyFrom(const Node0 &other); void copyChildrenAndKeyFrom(const Node0 &other);
void copyChildrenAndKeyFrom(const Node3 &other); void copyChildrenAndKeyFrom(const Node3 &other);
void copyChildrenAndKeyFrom(const struct Node16 &other); void copyChildrenAndKeyFrom(const struct Node16 &other);
size_t size() const { return sizeof(Node3) + getCapacity(); } size_t size() const { return sizeof(Node3) + getCapacity(); }
}; };
struct Node16 : Node { struct Node16 : Node {
constexpr static auto kType = Type_Node16; constexpr static auto kType = Type_Node16;
constexpr static auto kMaxNodes = 16; constexpr static auto kMaxNodes = 16;
// Sorted
uint8_t index[kMaxNodes];
Node *children[kMaxNodes]; Node *children[kMaxNodes];
InternalVersionT childMaxVersion[kMaxNodes]; InternalVersionT childMaxVersion[kMaxNodes];
uint8_t *partialKey() { return (uint8_t *)(this + 1); } // Sorted
uint8_t index[kMaxNodes];
uint8_t *partialKey() { return (uint8_t *)(this + 1); }
void copyChildrenAndKeyFrom(const Node3 &other); void copyChildrenAndKeyFrom(const Node3 &other);
void copyChildrenAndKeyFrom(const Node16 &other); void copyChildrenAndKeyFrom(const Node16 &other);
void copyChildrenAndKeyFrom(const struct Node48 &other); void copyChildrenAndKeyFrom(const struct Node48 &other);
size_t size() const { return sizeof(Node16) + getCapacity(); } size_t size() const { return sizeof(Node16) + getCapacity(); }
}; };
struct Node48 : Node { struct Node48 : Node {
constexpr static auto kType = Type_Node48; constexpr static auto kType = Type_Node48;
constexpr static auto kMaxNodes = 48; constexpr static auto kMaxNodes = 48;
BitSet bitSet;
int8_t nextFree;
int8_t index[256];
Node *children[kMaxNodes];
InternalVersionT childMaxVersion[kMaxNodes];
uint8_t reverseIndex[kMaxNodes];
constexpr static int kMaxOfMaxPageSize = 16; constexpr static int kMaxOfMaxPageSize = 16;
constexpr static int kMaxOfMaxShift = constexpr static int kMaxOfMaxShift =
std::countr_zero(uint32_t(kMaxOfMaxPageSize)); std::countr_zero(uint32_t(kMaxOfMaxPageSize));
constexpr static int kMaxOfMaxTotalPages = kMaxNodes / kMaxOfMaxPageSize; constexpr static int kMaxOfMaxTotalPages = kMaxNodes / kMaxOfMaxPageSize;
BitSet bitSet;
Node *children[kMaxNodes];
InternalVersionT childMaxVersion[kMaxNodes];
InternalVersionT maxOfMax[kMaxOfMaxTotalPages]; InternalVersionT maxOfMax[kMaxOfMaxTotalPages];
uint8_t reverseIndex[kMaxNodes];
int8_t index[256];
uint8_t *partialKey() { return (uint8_t *)(this + 1); } uint8_t *partialKey() { return (uint8_t *)(this + 1); }
void copyChildrenAndKeyFrom(const Node16 &other); void copyChildrenAndKeyFrom(const Node16 &other);
void copyChildrenAndKeyFrom(const Node48 &other); void copyChildrenAndKeyFrom(const Node48 &other);
void copyChildrenAndKeyFrom(const struct Node256 &other); void copyChildrenAndKeyFrom(const struct Node256 &other);
size_t size() const { return sizeof(Node48) + getCapacity(); } size_t size() const { return sizeof(Node48) + getCapacity(); }
}; };
struct Node256 : Node { struct Node256 : Node {
constexpr static auto kType = Type_Node256; constexpr static auto kType = Type_Node256;
BitSet bitSet; constexpr static auto kMaxNodes = 256;
Node *children[256];
InternalVersionT childMaxVersion[256];
constexpr static int kMaxOfMaxPageSize = 16; constexpr static int kMaxOfMaxPageSize = 16;
constexpr static int kMaxOfMaxShift = constexpr static int kMaxOfMaxShift =
std::countr_zero(uint32_t(kMaxOfMaxPageSize)); std::countr_zero(uint32_t(kMaxOfMaxPageSize));
constexpr static int kMaxOfMaxTotalPages = 256 / kMaxOfMaxPageSize; constexpr static int kMaxOfMaxTotalPages = kMaxNodes / kMaxOfMaxPageSize;
BitSet bitSet;
Node *children[kMaxNodes];
InternalVersionT childMaxVersion[kMaxNodes];
InternalVersionT maxOfMax[kMaxOfMaxTotalPages]; InternalVersionT maxOfMax[kMaxOfMaxTotalPages];
uint8_t *partialKey() { return (uint8_t *)(this + 1); } uint8_t *partialKey() { return (uint8_t *)(this + 1); }
void copyChildrenAndKeyFrom(const Node48 &other); void copyChildrenAndKeyFrom(const Node48 &other);
void copyChildrenAndKeyFrom(const Node256 &other); void copyChildrenAndKeyFrom(const Node256 &other);
size_t size() const { return sizeof(Node256) + getCapacity(); } size_t size() const { return sizeof(Node256) + getCapacity(); }
}; };
@@ -322,7 +321,7 @@ inline void Node3::copyChildrenAndKeyFrom(const Node0 &other) {
inline void Node3::copyChildrenAndKeyFrom(const Node3 &other) { inline void Node3::copyChildrenAndKeyFrom(const Node3 &other) {
memcpy((char *)this + kNodeCopyBegin, (char *)&other + kNodeCopyBegin, memcpy((char *)this + kNodeCopyBegin, (char *)&other + kNodeCopyBegin,
kNodeCopySize); kNodeCopySize);
memcpy(index, other.index, sizeof(*this) - sizeof(Node)); memcpy(children, other.children, sizeof(*this) - sizeof(Node));
memcpy(partialKey(), &other + 1, partialKeyLen); memcpy(partialKey(), &other + 1, partialKeyLen);
for (int i = 0; i < numChildren; ++i) { for (int i = 0; i < numChildren; ++i) {
assert(children[i]->parent == &other); assert(children[i]->parent == &other);
@@ -403,7 +402,6 @@ inline void Node48::copyChildrenAndKeyFrom(const Node16 &other) {
} }
memcpy(partialKey(), &other + 1, partialKeyLen); memcpy(partialKey(), &other + 1, partialKeyLen);
bitSet.init(); bitSet.init();
nextFree = Node16::kMaxNodes;
int i = 0; int i = 0;
for (auto x : other.index) { for (auto x : other.index) {
bitSet.set(x); bitSet.set(x);
@@ -423,7 +421,6 @@ inline void Node48::copyChildrenAndKeyFrom(const Node48 &other) {
memcpy((char *)this + kNodeCopyBegin, (char *)&other + kNodeCopyBegin, memcpy((char *)this + kNodeCopyBegin, (char *)&other + kNodeCopyBegin,
kNodeCopySize); kNodeCopySize);
bitSet = other.bitSet; bitSet = other.bitSet;
nextFree = other.nextFree;
memcpy(index, other.index, sizeof(index)); memcpy(index, other.index, sizeof(index));
memset(children, 0, sizeof(children)); memset(children, 0, sizeof(children));
const auto z = InternalVersionT::zero; const auto z = InternalVersionT::zero;
@@ -450,7 +447,6 @@ inline void Node48::copyChildrenAndKeyFrom(const Node256 &other) {
for (auto &v : childMaxVersion) { for (auto &v : childMaxVersion) {
v = z; v = z;
} }
nextFree = other.numChildren;
bitSet = other.bitSet; bitSet = other.bitSet;
int i = 0; int i = 0;
bitSet.forEachSet([&](int c) { bitSet.forEachSet([&](int c) {
@@ -626,6 +622,7 @@ template <class T> struct BoundedFreeListAllocator {
T *allocate(int partialKeyCapacity) { T *allocate(int partialKeyCapacity) {
T *result = allocate_helper(partialKeyCapacity); T *result = allocate_helper(partialKeyCapacity);
result->endOfRange = false;
if constexpr (!std::is_same_v<T, Node0>) { if constexpr (!std::is_same_v<T, Node0>) {
memset(result->children, 0, sizeof(result->children)); memset(result->children, 0, sizeof(result->children));
const auto z = InternalVersionT::zero; const auto z = InternalVersionT::zero;
@@ -693,23 +690,6 @@ uint8_t *Node::partialKey() {
} }
} }
size_t Node::size() const {
switch (type) {
case Type_Node0:
return ((Node0 *)this)->size();
case Type_Node3:
return ((Node3 *)this)->size();
case Type_Node16:
return ((Node16 *)this)->size();
case Type_Node48:
return ((Node48 *)this)->size();
case Type_Node256:
return ((Node256 *)this)->size();
default: // GCOVR_EXCL_LINE
__builtin_unreachable(); // GCOVR_EXCL_LINE
}
}
// A type that's plumbed along the check call tree. Lifetime ends after each // A type that's plumbed along the check call tree. Lifetime ends after each
// check call. // check call.
struct ReadContext { struct ReadContext {
@@ -788,22 +768,17 @@ private:
BoundedFreeListAllocator<Node256> node256; BoundedFreeListAllocator<Node256> node256;
}; };
template <class NodeT> int getNodeIndex(NodeT *self, uint8_t index) { int getNodeIndex(Node3 *self, uint8_t index) {
static_assert(std::is_same_v<NodeT, Node3> || std::is_same_v<NodeT, Node16>); Node3 *n = (Node3 *)self;
for (int i = 0; i < n->numChildren; ++i) {
// cachegrind says the plain loop is fewer instructions and more mis-predicted if (n->index[i] == index) {
// branches. Microbenchmark says plain loop is faster. It's written in this return i;
// weird "generic" way though in case someday we can use the simd
// implementation easily if we want.
if constexpr (std::is_same_v<NodeT, Node3>) {
Node3 *n = (Node3 *)self;
for (int i = 0; i < n->numChildren; ++i) {
if (n->index[i] == index) {
return i;
}
} }
return -1;
} }
return -1;
}
int getNodeIndex(Node16 *self, uint8_t index) {
#ifdef HAS_AVX #ifdef HAS_AVX
// Based on https://www.the-paper-trail.org/post/art-paper-notes/ // Based on https://www.the-paper-trail.org/post/art-paper-notes/
@@ -816,7 +791,7 @@ template <class NodeT> int getNodeIndex(NodeT *self, uint8_t index) {
// keys aren't valid, we'll mask the results to only consider the valid ones // keys aren't valid, we'll mask the results to only consider the valid ones
// below. // below.
__m128i indices; __m128i indices;
memcpy(&indices, self->index, NodeT::kMaxNodes); memcpy(&indices, self->index, Node16::kMaxNodes);
__m128i results = _mm_cmpeq_epi8(key_vec, indices); __m128i results = _mm_cmpeq_epi8(key_vec, indices);
// Build a mask to select only the first node->num_children values from the // Build a mask to select only the first node->num_children values from the
@@ -839,12 +814,11 @@ template <class NodeT> int getNodeIndex(NodeT *self, uint8_t index) {
// https://community.arm.com/arm-community-blogs/b/infrastructure-solutions-blog/posts/porting-x86-vector-bitmask-optimizations-to-arm-neon // https://community.arm.com/arm-community-blogs/b/infrastructure-solutions-blog/posts/porting-x86-vector-bitmask-optimizations-to-arm-neon
uint8x16_t indices; uint8x16_t indices;
memcpy(&indices, self->index, NodeT::kMaxNodes); memcpy(&indices, self->index, Node16::kMaxNodes);
// 0xff for each match // 0xff for each match
uint16x8_t results = uint16x8_t results =
vreinterpretq_u16_u8(vceqq_u8(vdupq_n_u8(index), indices)); vreinterpretq_u16_u8(vceqq_u8(vdupq_n_u8(index), indices));
static_assert(NodeT::kMaxNodes <= 16); assume(self->numChildren <= Node16::kMaxNodes);
assume(self->numChildren <= NodeT::kMaxNodes);
uint64_t mask = self->numChildren == 16 uint64_t mask = self->numChildren == 16
? uint64_t(-1) ? uint64_t(-1)
: (uint64_t(1) << (self->numChildren * 4)) - 1; : (uint64_t(1) << (self->numChildren * 4)) - 1;
@@ -1097,22 +1071,18 @@ ChildAndMaxVersion getChildAndMaxVersion(Node *self, uint8_t index) {
} }
} }
template <class NodeT> Node *getChildGeqSimd(NodeT *self, int child) { Node *getChildGeq(Node0 *, int) { return nullptr; }
static_assert(std::is_same_v<NodeT, Node3> || std::is_same_v<NodeT, Node16>);
// cachegrind says the plain loop is fewer instructions and more mis-predicted Node *getChildGeq(Node3 *n, int child) {
// branches. Microbenchmark says plain loop is faster. It's written in this for (int i = 0; i < n->numChildren; ++i) {
// weird "generic" way though so that someday we can use the simd if (n->index[i] >= child) {
// implementation easily if we want. return n->children[i];
if constexpr (std::is_same_v<NodeT, Node3>) {
Node3 *n = (Node3 *)self;
for (int i = 0; i < n->numChildren; ++i) {
if (n->index[i] >= child) {
return n->children[i];
}
} }
return nullptr;
} }
return nullptr;
}
Node *getChildGeq(Node16 *self, int child) {
if (child > 255) { if (child > 255) {
return nullptr; return nullptr;
} }
@@ -1120,7 +1090,7 @@ template <class NodeT> Node *getChildGeqSimd(NodeT *self, int child) {
#ifdef HAS_AVX #ifdef HAS_AVX
__m128i key_vec = _mm_set1_epi8(child); __m128i key_vec = _mm_set1_epi8(child);
__m128i indices; __m128i indices;
memcpy(&indices, self->index, NodeT::kMaxNodes); memcpy(&indices, self->index, Node16::kMaxNodes);
__m128i results = _mm_cmpeq_epi8(key_vec, _mm_min_epu8(key_vec, indices)); __m128i results = _mm_cmpeq_epi8(key_vec, _mm_min_epu8(key_vec, indices));
int mask = (1 << self->numChildren) - 1; int mask = (1 << self->numChildren) - 1;
uint32_t bitfield = _mm_movemask_epi8(results) & mask; uint32_t bitfield = _mm_movemask_epi8(results) & mask;
@@ -1130,8 +1100,7 @@ template <class NodeT> Node *getChildGeqSimd(NodeT *self, int child) {
memcpy(&indices, self->index, sizeof(self->index)); memcpy(&indices, self->index, sizeof(self->index));
// 0xff for each leq // 0xff for each leq
auto results = vcleq_u8(vdupq_n_u8(child), indices); auto results = vcleq_u8(vdupq_n_u8(child), indices);
static_assert(NodeT::kMaxNodes <= 16); assume(self->numChildren <= Node16::kMaxNodes);
assume(self->numChildren <= NodeT::kMaxNodes);
uint64_t mask = self->numChildren == 16 uint64_t mask = self->numChildren == 16
? uint64_t(-1) ? uint64_t(-1)
: (uint64_t(1) << (self->numChildren * 4)) - 1; : (uint64_t(1) << (self->numChildren * 4)) - 1;
@@ -1156,13 +1125,6 @@ template <class NodeT> Node *getChildGeqSimd(NodeT *self, int child) {
#endif #endif
} }
Node *getChildGeq(Node0 *, int) { return nullptr; }
Node *getChildGeq(Node3 *self, int child) {
return getChildGeqSimd(self, child);
}
Node *getChildGeq(Node16 *self, int child) {
return getChildGeqSimd(self, child);
}
Node *getChildGeq(Node48 *self, int child) { Node *getChildGeq(Node48 *self, int child) {
int c = self->bitSet.firstSetGeq(child); int c = self->bitSet.firstSetGeq(child);
if (c < 0) { if (c < 0) {
@@ -1375,7 +1337,7 @@ Node *&getOrCreateChild(Node *&self, std::span<const uint8_t> &key,
auto *self3 = static_cast<Node3 *>(self); auto *self3 = static_cast<Node3 *>(self);
int i = self->numChildren - 1; int i = self->numChildren - 1;
for (; i >= 0; --i) { for (; i >= 0; --i) {
if (int(self3->index[i]) < int(index)) { if (self3->index[i] < index) {
break; break;
} }
self3->index[i + 1] = self3->index[i]; self3->index[i + 1] = self3->index[i];
@@ -1405,7 +1367,7 @@ Node *&getOrCreateChild(Node *&self, std::span<const uint8_t> &key,
auto *self16 = static_cast<Node16 *>(self); auto *self16 = static_cast<Node16 *>(self);
int i = self->numChildren - 1; int i = self->numChildren - 1;
for (; i >= 0; --i) { for (; i >= 0; --i) {
if (int(self16->index[i]) < int(index)) { if (self16->index[i] < index) {
break; break;
} }
self16->index[i + 1] = self16->index[i]; self16->index[i + 1] = self16->index[i];
@@ -1434,9 +1396,7 @@ Node *&getOrCreateChild(Node *&self, std::span<const uint8_t> &key,
insert48: insert48:
auto *self48 = static_cast<Node48 *>(self); auto *self48 = static_cast<Node48 *>(self);
self48->bitSet.set(index); self48->bitSet.set(index);
++self->numChildren; auto nextFree = self48->numChildren++;
assert(self48->nextFree < 48);
int nextFree = self48->nextFree++;
self48->index[index] = nextFree; self48->index[index] = nextFree;
self48->reverseIndex[nextFree] = index; self48->reverseIndex[nextFree] = index;
auto &result = self48->children[nextFree]; auto &result = self48->children[nextFree];
@@ -1584,8 +1544,6 @@ void maybeDecreaseCapacity(Node *&self, WriteContext *tls,
} }
#if defined(HAS_AVX) && !defined(__SANITIZE_THREAD__) #if defined(HAS_AVX) && !defined(__SANITIZE_THREAD__)
// This gets covered in local development
// GCOVR_EXCL_START
__attribute__((target("avx512f"))) void rezero16(InternalVersionT *vs, __attribute__((target("avx512f"))) void rezero16(InternalVersionT *vs,
InternalVersionT zero) { InternalVersionT zero) {
uint32_t z; uint32_t z;
@@ -1595,7 +1553,6 @@ __attribute__((target("avx512f"))) void rezero16(InternalVersionT *vs,
_mm512_sub_epi32(_mm512_loadu_epi32(vs), zvec), _mm512_setzero_epi32()); _mm512_sub_epi32(_mm512_loadu_epi32(vs), zvec), _mm512_setzero_epi32());
_mm512_mask_storeu_epi32(vs, m, zvec); _mm512_mask_storeu_epi32(vs, m, zvec);
} }
// GCOVR_EXCL_STOP
__attribute__((target("default"))) __attribute__((target("default")))
#endif #endif
@@ -1690,53 +1647,67 @@ void mergeWithChild(Node *&self, WriteContext *tls, ConflictSet::Impl *impl,
tls->release(self3); tls->release(self3);
} }
void maybeDownsize(Node *self, WriteContext *tls, ConflictSet::Impl *impl, bool needsDownsize(Node *n) {
Node *&dontInvalidate) { static int minTable[] = {0, kMinChildrenNode3, kMinChildrenNode16,
kMinChildrenNode48, kMinChildrenNode256};
return n->numChildren + n->entryPresent < minTable[n->getType()];
}
#if DEBUG_VERBOSE && !defined(NDEBUG) void downsize(Node3 *self, WriteContext *tls, ConflictSet::Impl *impl,
fprintf(stderr, "maybeDownsize: %s\n", getSearchPathPrintable(self).c_str()); Node *&dontInvalidate) {
#endif if (self->numChildren == 0) {
auto *newSelf = tls->allocate<Node0>(self->partialKeyLen);
newSelf->copyChildrenAndKeyFrom(*self);
getInTree(self, impl) = newSelf;
tls->release(self);
} else {
assert(self->numChildren == 1 && !self->entryPresent);
mergeWithChild(getInTree(self, impl), tls, impl, dontInvalidate, self);
}
}
void downsize(Node16 *self, WriteContext *tls, ConflictSet::Impl *impl) {
assert(self->numChildren + int(self->entryPresent) < kMinChildrenNode16);
auto *newSelf = tls->allocate<Node3>(self->partialKeyLen);
newSelf->copyChildrenAndKeyFrom(*self);
getInTree(self, impl) = newSelf;
tls->release(self);
}
void downsize(Node48 *self, WriteContext *tls, ConflictSet::Impl *impl) {
assert(self->numChildren + int(self->entryPresent) < kMinChildrenNode48);
auto *newSelf = tls->allocate<Node16>(self->partialKeyLen);
newSelf->copyChildrenAndKeyFrom(*self);
getInTree(self, impl) = newSelf;
tls->release(self);
}
void downsize(Node256 *self, WriteContext *tls, ConflictSet::Impl *impl) {
assert(self->numChildren + int(self->entryPresent) < kMinChildrenNode256);
auto *self256 = (Node256 *)self;
auto *newSelf = tls->allocate<Node48>(self->partialKeyLen);
newSelf->copyChildrenAndKeyFrom(*self256);
getInTree(self, impl) = newSelf;
tls->release(self256);
}
void downsize(Node *self, WriteContext *tls, ConflictSet::Impl *impl,
Node *&dontInvalidate) {
switch (self->getType()) { switch (self->getType()) {
case Type_Node0: // GCOVR_EXCL_LINE case Type_Node0: // GCOVR_EXCL_LINE
__builtin_unreachable(); // GCOVR_EXCL_LINE __builtin_unreachable(); // GCOVR_EXCL_LINE
case Type_Node3: { case Type_Node3:
auto *self3 = (Node3 *)self; downsize(static_cast<Node3 *>(self), tls, impl, dontInvalidate);
if (self->numChildren == 0) { break;
auto *newSelf = tls->allocate<Node0>(self->partialKeyLen);
newSelf->copyChildrenAndKeyFrom(*self3);
getInTree(self, impl) = newSelf;
tls->release(self3);
} else if (self->numChildren == 1 && !self->entryPresent) {
mergeWithChild(getInTree(self, impl), tls, impl, dontInvalidate, self3);
}
} break;
case Type_Node16: case Type_Node16:
if (self->numChildren + int(self->entryPresent) < kMinChildrenNode16) { downsize(static_cast<Node16 *>(self), tls, impl);
auto *self16 = (Node16 *)self;
auto *newSelf = tls->allocate<Node3>(self->partialKeyLen);
newSelf->copyChildrenAndKeyFrom(*self16);
getInTree(self, impl) = newSelf;
tls->release(self16);
}
break; break;
case Type_Node48: case Type_Node48:
if (self->numChildren + int(self->entryPresent) < kMinChildrenNode48) { downsize(static_cast<Node48 *>(self), tls, impl);
auto *self48 = (Node48 *)self;
auto *newSelf = tls->allocate<Node16>(self->partialKeyLen);
newSelf->copyChildrenAndKeyFrom(*self48);
getInTree(self, impl) = newSelf;
tls->release(self48);
}
break; break;
case Type_Node256: case Type_Node256:
if (self->numChildren + int(self->entryPresent) < kMinChildrenNode256) { downsize(static_cast<Node256 *>(self), tls, impl);
auto *self256 = (Node256 *)self;
auto *newSelf = tls->allocate<Node48>(self->partialKeyLen);
newSelf->copyChildrenAndKeyFrom(*self256);
getInTree(self, impl) = newSelf;
tls->release(self256);
}
break; break;
default: // GCOVR_EXCL_LINE default: // GCOVR_EXCL_LINE
__builtin_unreachable(); // GCOVR_EXCL_LINE __builtin_unreachable(); // GCOVR_EXCL_LINE
@@ -1745,10 +1716,10 @@ void maybeDownsize(Node *self, WriteContext *tls, ConflictSet::Impl *impl,
// Precondition: self is not the root. May invalidate nodes along the search // Precondition: self is not the root. May invalidate nodes along the search
// path to self. May invalidate children of self->parent. Returns a pointer to // path to self. May invalidate children of self->parent. Returns a pointer to
// the node after self. If erase invalidates the pointee of `dontInvalidate`, it // the node after self. Precondition: `self->entryPresent`
// will update it to its new pointee as well. Precondition: `self->entryPresent`
Node *erase(Node *self, WriteContext *tls, ConflictSet::Impl *impl, Node *erase(Node *self, WriteContext *tls, ConflictSet::Impl *impl,
bool logical, Node *&dontInvalidate) { bool logical) {
++tls->accum.entries_erased; ++tls->accum.entries_erased;
assert(self->parent != nullptr); assert(self->parent != nullptr);
@@ -1766,10 +1737,8 @@ Node *erase(Node *self, WriteContext *tls, ConflictSet::Impl *impl,
self->entryPresent = false; self->entryPresent = false;
if (self->numChildren != 0) { if (self->numChildren != 0) {
const bool update = result == dontInvalidate; if (needsDownsize(self)) {
maybeDownsize(self, tls, impl, result); downsize(self, tls, impl, result);
if (update) {
dontInvalidate = result;
} }
return result; return result;
} }
@@ -1790,7 +1759,10 @@ Node *erase(Node *self, WriteContext *tls, ConflictSet::Impl *impl,
parent3->children[i] = parent3->children[i + 1]; parent3->children[i] = parent3->children[i + 1];
parent3->childMaxVersion[i] = parent3->childMaxVersion[i + 1]; parent3->childMaxVersion[i] = parent3->childMaxVersion[i + 1];
} }
assert(parent->numChildren > 0 || parent->entryPresent);
if (needsDownsize(parent3)) {
downsize(parent3, tls, impl, result);
}
} break; } break;
case Type_Node16: { case Type_Node16: {
auto *parent16 = static_cast<Node16 *>(parent); auto *parent16 = static_cast<Node16 *>(parent);
@@ -1803,15 +1775,16 @@ Node *erase(Node *self, WriteContext *tls, ConflictSet::Impl *impl,
parent16->childMaxVersion[i] = parent16->childMaxVersion[i + 1]; parent16->childMaxVersion[i] = parent16->childMaxVersion[i + 1];
} }
// By kMinChildrenNode16 if (needsDownsize(parent16)) {
assert(parent->numChildren > 0); downsize(parent16, tls, impl, result);
}
} break; } break;
case Type_Node48: { case Type_Node48: {
auto *parent48 = static_cast<Node48 *>(parent); auto *parent48 = static_cast<Node48 *>(parent);
parent48->bitSet.reset(parentsIndex); parent48->bitSet.reset(parentsIndex);
int8_t toRemoveChildrenIndex = int8_t toRemoveChildrenIndex =
std::exchange(parent48->index[parentsIndex], -1); std::exchange(parent48->index[parentsIndex], -1);
int8_t lastChildrenIndex = --parent48->nextFree; auto lastChildrenIndex = --parent48->numChildren;
assert(toRemoveChildrenIndex >= 0); assert(toRemoveChildrenIndex >= 0);
assert(lastChildrenIndex >= 0); assert(lastChildrenIndex >= 0);
if (toRemoveChildrenIndex != lastChildrenIndex) { if (toRemoveChildrenIndex != lastChildrenIndex) {
@@ -1830,10 +1803,9 @@ Node *erase(Node *self, WriteContext *tls, ConflictSet::Impl *impl,
} }
parent48->childMaxVersion[lastChildrenIndex] = tls->zero; parent48->childMaxVersion[lastChildrenIndex] = tls->zero;
--parent->numChildren; if (needsDownsize(parent48)) {
downsize(parent48, tls, impl, result);
// By kMinChildrenNode48 }
assert(parent->numChildren > 0);
} break; } break;
case Type_Node256: { case Type_Node256: {
auto *parent256 = static_cast<Node256 *>(parent); auto *parent256 = static_cast<Node256 *>(parent);
@@ -1842,20 +1814,14 @@ Node *erase(Node *self, WriteContext *tls, ConflictSet::Impl *impl,
--parent->numChildren; --parent->numChildren;
// By kMinChildrenNode256 if (needsDownsize(parent256)) {
assert(parent->numChildren > 0); downsize(parent256, tls, impl, result);
}
} break; } break;
default: // GCOVR_EXCL_LINE default: // GCOVR_EXCL_LINE
__builtin_unreachable(); // GCOVR_EXCL_LINE __builtin_unreachable(); // GCOVR_EXCL_LINE
} }
const bool update = result == dontInvalidate;
maybeDownsize(parent, tls, impl, result);
if (update) {
dontInvalidate = result;
}
return result; return result;
} }
@@ -2023,9 +1989,9 @@ downLeftSpine:
#ifdef HAS_AVX #ifdef HAS_AVX
uint32_t compare16_32bit(const InternalVersionT *vs, InternalVersionT rv) { uint32_t compare16_32bit(const InternalVersionT *vs, InternalVersionT rv) {
uint32_t compared = 0; uint32_t compared = 0;
__m128i w[4]; __m128i w[4]; // GCOVR_EXCL_LINE
memcpy(w, vs, sizeof(w)); memcpy(w, vs, sizeof(w));
uint32_t r; uint32_t r; // GCOVR_EXCL_LINE
memcpy(&r, &rv, sizeof(r)); memcpy(&r, &rv, sizeof(r));
const auto rvVec = _mm_set1_epi32(r); const auto rvVec = _mm_set1_epi32(r);
const auto zero = _mm_setzero_si128(); const auto zero = _mm_setzero_si128();
@@ -2037,8 +2003,6 @@ uint32_t compare16_32bit(const InternalVersionT *vs, InternalVersionT rv) {
return compared; return compared;
} }
// This gets covered in local development
// GCOVR_EXCL_START
__attribute__((target("avx512f"))) uint32_t __attribute__((target("avx512f"))) uint32_t
compare16_32bit_avx512(const InternalVersionT *vs, InternalVersionT rv) { compare16_32bit_avx512(const InternalVersionT *vs, InternalVersionT rv) {
uint32_t r; uint32_t r;
@@ -2048,7 +2012,6 @@ compare16_32bit_avx512(const InternalVersionT *vs, InternalVersionT rv) {
_mm512_setzero_epi32()); _mm512_setzero_epi32());
} }
#endif #endif
// GCOVR_EXCL_STOP
// Returns true if v[i] <= readVersion for all i such that begin <= is[i] < end // Returns true if v[i] <= readVersion for all i such that begin <= is[i] < end
// Preconditions: begin <= end, end - begin < 256 // Preconditions: begin <= end, end - begin < 256
@@ -2344,10 +2307,8 @@ bool checkMaxBetweenExclusiveImpl(Node *n, int begin, int end,
int c = self->bitSet.firstSetGeq(begin + 1); int c = self->bitSet.firstSetGeq(begin + 1);
if (c >= 0 && c < end) { if (c >= 0 && c < end) {
auto *child = self->children[self->index[c]]; auto *child = self->children[self->index[c]];
if (child->entryPresent) { if (child->entryPresent && child->entry.rangeVersion > readVersion) {
if (!(child->entry.rangeVersion <= readVersion)) { return false;
return false;
};
} }
begin = c; begin = c;
} else { } else {
@@ -2380,10 +2341,8 @@ bool checkMaxBetweenExclusiveImpl(Node *n, int begin, int end,
int c = self->bitSet.firstSetGeq(begin + 1); int c = self->bitSet.firstSetGeq(begin + 1);
if (c >= 0 && c < end) { if (c >= 0 && c < end) {
auto *child = self->children[c]; auto *child = self->children[c];
if (child->entryPresent) { if (child->entryPresent && child->entry.rangeVersion > readVersion) {
if (!(child->entry.rangeVersion <= readVersion)) { return false;
return false;
};
} }
begin = c; begin = c;
} else { } else {
@@ -2426,9 +2385,7 @@ bool checkMaxBetweenExclusiveImpl(Node *n, int begin, int end,
} }
} }
// Check inner pages // Check inner pages
const int innerPageBegin = (begin >> Node256::kMaxOfMaxShift) + 1; return scan16<kAVX512>(self->maxOfMax, firstPage + 1, lastPage,
const int innerPageEnd = (end - 1) >> Node256::kMaxOfMaxShift;
return scan16<kAVX512>(self->maxOfMax, innerPageBegin, innerPageEnd,
readVersion); readVersion);
} }
default: // GCOVR_EXCL_LINE default: // GCOVR_EXCL_LINE
@@ -2437,14 +2394,11 @@ bool checkMaxBetweenExclusiveImpl(Node *n, int begin, int end,
} }
#if defined(HAS_AVX) && !defined(__SANITIZE_THREAD__) #if defined(HAS_AVX) && !defined(__SANITIZE_THREAD__)
// This gets covered in local development
// GCOVR_EXCL_START
__attribute__((target("avx512f"))) bool __attribute__((target("avx512f"))) bool
checkMaxBetweenExclusive(Node *n, int begin, int end, checkMaxBetweenExclusive(Node *n, int begin, int end,
InternalVersionT readVersion, ReadContext *tls) { InternalVersionT readVersion, ReadContext *tls) {
return checkMaxBetweenExclusiveImpl<true>(n, begin, end, readVersion, tls); return checkMaxBetweenExclusiveImpl<true>(n, begin, end, readVersion, tls);
} }
// GCOVR_EXCL_STOP
__attribute__((target("default"))) __attribute__((target("default")))
#endif #endif
bool checkMaxBetweenExclusive(Node *n, int begin, int end, bool checkMaxBetweenExclusive(Node *n, int begin, int end,
@@ -2931,7 +2885,6 @@ void eraseTree(Node *root, WriteContext *tls) {
tls->accum.entries_erased += n->entryPresent; tls->accum.entries_erased += n->entryPresent;
++tls->accum.nodes_released; ++tls->accum.nodes_released;
removeNode(n);
removeKey(n); removeKey(n);
switch (n->getType()) { switch (n->getType()) {
@@ -3071,13 +3024,21 @@ void addWriteRange(Node *&root, std::span<const uint8_t> begin,
} }
endNode->entry.rangeVersion = writeVersion; endNode->entry.rangeVersion = writeVersion;
for (beginNode = nextLogical(beginNode); beginNode != endNode; // Erase nodes in range
beginNode = erase(beginNode, tls, impl, /*logical*/ true, endNode)) { assert(!beginNode->endOfRange);
assert(!endNode->endOfRange);
endNode->endOfRange = true;
auto *iter = beginNode;
for (iter = nextLogical(iter); !iter->endOfRange;
iter = erase(iter, tls, impl, /*logical*/ true)) {
assert(!iter->endOfRange);
} }
assert(iter->endOfRange);
iter->endOfRange = false;
// Inserting end trashed endNode's maxVersion. Fix that. Safe to call since // Inserting end trashed the last node's maxVersion. Fix that. Safe to call
// the end key always has non-zero size. // since the end key always has non-zero size.
fixupMaxVersion(endNode, tls); fixupMaxVersion(iter, tls);
} }
Node *firstGeqPhysical(Node *n, const std::span<const uint8_t> key) { Node *firstGeqPhysical(Node *n, const std::span<const uint8_t> key) {
@@ -3262,8 +3223,7 @@ struct __attribute__((visibility("hidden"))) ConflictSet::Impl {
// There's no way to insert a range such that range version of the right // There's no way to insert a range such that range version of the right
// node is greater than the point version of the left node // node is greater than the point version of the left node
assert(n->entry.rangeVersion <= oldestVersion); assert(n->entry.rangeVersion <= oldestVersion);
Node *dummy = nullptr; n = erase(n, &tls, this, /*logical*/ false);
n = erase(n, &tls, this, /*logical*/ false, dummy);
} else { } else {
maybeDecreaseCapacity(n, &tls, this); maybeDecreaseCapacity(n, &tls, this);
n = nextPhysical(n); n = nextPhysical(n);

View File

@@ -748,7 +748,10 @@ struct TestDriver {
fprintf(stderr, "%p Set oldest version: %" PRId64 "\n", this, fprintf(stderr, "%p Set oldest version: %" PRId64 "\n", this,
oldestVersion); oldestVersion);
#endif #endif
CALLGRIND_START_INSTRUMENTATION;
cs.setOldestVersion(oldestVersion); cs.setOldestVersion(oldestVersion);
CALLGRIND_STOP_INSTRUMENTATION;
if constexpr (kEnableAssertions) { if constexpr (kEnableAssertions) {
refImpl.setOldestVersion(oldestVersion); refImpl.setOldestVersion(oldestVersion);
} }

View File

@@ -24,15 +24,15 @@ Hardware for all benchmarks is an AMD Ryzen 9 7900 with (2x32GB) 5600MT/s CL28-3
| ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark | ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 11.04 | 90,614,308.12 | 0.8% | 180.38 | 55.13 | 3.272 | 41.51 | 0.4% | 0.01 | `point reads` | 11.18 | 89,455,125.34 | 0.6% | 185.37 | 57.08 | 3.248 | 41.51 | 0.4% | 0.01 | `point reads`
| 14.96 | 66,843,629.12 | 0.4% | 274.41 | 74.73 | 3.672 | 55.05 | 0.3% | 0.01 | `prefix reads` | 14.53 | 68,800,688.89 | 0.4% | 282.41 | 74.80 | 3.776 | 55.06 | 0.3% | 0.01 | `prefix reads`
| 37.06 | 26,982,847.61 | 0.2% | 791.04 | 185.28 | 4.269 | 142.67 | 0.2% | 0.01 | `range reads` | 36.54 | 27,367,576.87 | 0.2% | 798.06 | 188.90 | 4.225 | 141.69 | 0.2% | 0.01 | `range reads`
| 17.89 | 55,887,365.73 | 0.6% | 335.54 | 89.79 | 3.737 | 43.84 | 0.4% | 0.01 | `point writes` | 16.69 | 59,912,106.02 | 0.6% | 314.57 | 86.29 | 3.645 | 39.84 | 0.4% | 0.01 | `point writes`
| 31.85 | 31,394,336.65 | 0.3% | 615.32 | 159.63 | 3.855 | 87.69 | 0.2% | 0.01 | `prefix writes` | 30.09 | 33,235,744.07 | 0.5% | 591.33 | 155.92 | 3.793 | 82.69 | 0.2% | 0.01 | `prefix writes`
| 36.17 | 27,647,221.45 | 0.6% | 705.11 | 182.80 | 3.857 | 100.62 | 0.1% | 0.01 | `range writes` | 35.77 | 27,956,388.03 | 1.4% | 682.25 | 187.63 | 3.636 | 96.12 | 0.1% | 0.01 | `range writes`
| 79.01 | 12,656,457.78 | 0.7% | 1,498.35 | 402.46 | 3.723 | 270.50 | 0.1% | 0.01 | `monotonic increasing point writes` | 74.04 | 13,505,408.41 | 2.7% | 1,448.95 | 392.10 | 3.695 | 260.53 | 0.1% | 0.01 | `monotonic increasing point writes`
| 303,667.50 | 3,293.08 | 1.1% | 3,931,273.00 | 1,612,702.50 | 2.438 | 806,223.33 | 0.0% | 0.01 | `worst case for radix tree` | 330,984.50 | 3,021.29 | 1.9% | 3,994,153.50 | 1,667,309.00 | 2.396 | 806,019.50 | 0.0% | 0.01 | `worst case for radix tree`
| 83.70 | 11,947,443.83 | 0.7% | 1,738.03 | 429.06 | 4.051 | 270.01 | 0.0% | 0.01 | `create and destroy` | 92.46 | 10,814,961.65 | 0.5% | 1,800.00 | 463.41 | 3.884 | 297.00 | 0.0% | 0.01 | `create and destroy`
# "Real data" test # "Real data" test
@@ -47,7 +47,7 @@ Check: 4.47891 seconds, 364.05 MB/s, Add: 4.55599 seconds, 123.058 MB/s, Gc rati
## radix tree ## radix tree
``` ```
Check: 0.958985 seconds, 1700.28 MB/s, Add: 1.35083 seconds, 415.044 MB/s, Gc ratio: 44.4768%, Peak idle memory: 2.33588e+06 Check: 0.953012 seconds, 1710.94 MB/s, Add: 1.30025 seconds, 431.188 MB/s, Gc ratio: 43.9816%, Peak idle memory: 2.28375e+06
``` ```
## hash table ## hash table

View File

@@ -6,11 +6,15 @@
#include <string.h> #include <string.h>
#include <string> #include <string>
#include <string_view> #include <string_view>
#include <sys/ioctl.h>
#include <sys/resource.h> #include <sys/resource.h>
#include <sys/socket.h> #include <sys/socket.h>
#include <sys/types.h>
#include <sys/uio.h> #include <sys/uio.h>
#include <thread> #include <thread>
#include <unistd.h> #include <unistd.h>
#include <utility>
#include <vector>
#include "ConflictSet.h" #include "ConflictSet.h"
#include "third_party/nadeau.h" #include "third_party/nadeau.h"
@@ -20,8 +24,8 @@ std::atomic<int64_t> transactions;
constexpr int kBaseSearchDepth = 32; constexpr int kBaseSearchDepth = 32;
constexpr int kWindowSize = 10000000; constexpr int kWindowSize = 10000000;
std::basic_string<uint8_t> numToKey(int64_t num) { std::string numToKey(int64_t num) {
std::basic_string<uint8_t> result; std::string result;
result.resize(kBaseSearchDepth + sizeof(int64_t)); result.resize(kBaseSearchDepth + sizeof(int64_t));
memset(result.data(), 0, kBaseSearchDepth); memset(result.data(), 0, kBaseSearchDepth);
int64_t be = __builtin_bswap64(num); int64_t be = __builtin_bswap64(num);
@@ -41,13 +45,13 @@ void workload(weaselab::ConflictSet *cs) {
auto pointK = numToKey(pointRv); auto pointK = numToKey(pointRv);
weaselab::ConflictSet::ReadRange reads[] = { weaselab::ConflictSet::ReadRange reads[] = {
{ {
{pointK.data(), int(pointK.size())}, {(const uint8_t *)pointK.data(), int(pointK.size())},
{nullptr, 0}, {nullptr, 0},
pointRv, pointRv,
}, },
{ {
{beginK.data(), int(beginK.size())}, {(const uint8_t *)beginK.data(), int(beginK.size())},
{endK.data(), int(endK.size())}, {(const uint8_t *)endK.data(), int(endK.size())},
version - 2, version - 2,
}, },
}; };
@@ -66,7 +70,7 @@ void workload(weaselab::ConflictSet *cs) {
{ {
weaselab::ConflictSet::WriteRange w; weaselab::ConflictSet::WriteRange w;
auto k = numToKey(version); auto k = numToKey(version);
w.begin.p = k.data(); w.begin.p = (const uint8_t *)k.data();
w.end.len = 0; w.end.len = 0;
if (version % (kWindowSize / 2) == 0) { if (version % (kWindowSize / 2) == 0) {
for (int l = 0; l <= k.size(); ++l) { for (int l = 0; l <= k.size(); ++l) {
@@ -79,9 +83,9 @@ void workload(weaselab::ConflictSet *cs) {
int64_t beginN = version - kWindowSize + rand() % kWindowSize; int64_t beginN = version - kWindowSize + rand() % kWindowSize;
auto b = numToKey(beginN); auto b = numToKey(beginN);
auto e = numToKey(beginN + 1000); auto e = numToKey(beginN + 1000);
w.begin.p = b.data(); w.begin.p = (const uint8_t *)b.data();
w.begin.len = b.size(); w.begin.len = b.size();
w.end.p = e.data(); w.end.p = (const uint8_t *)e.data();
w.end.len = e.size(); w.end.len = e.size();
cs->addWrites(&w, 1, version); cs->addWrites(&w, 1, version);
} }
@@ -164,36 +168,29 @@ double toSeconds(timeval t) {
return double(t.tv_sec) + double(t.tv_usec) * 1e-6; return double(t.tv_sec) + double(t.tv_usec) * 1e-6;
} }
#include <linux/perf_event.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <unistd.h>
#ifdef __linux__ #ifdef __linux__
#include <linux/perf_event.h>
struct PerfCounter { struct PerfCounter {
explicit PerfCounter(int event) { PerfCounter(int type, int config, const std::string &labels = {},
int groupLeaderFd = -1)
: labels(labels) {
struct perf_event_attr pe; struct perf_event_attr pe;
memset(&pe, 0, sizeof(pe)); memset(&pe, 0, sizeof(pe));
pe.type = PERF_TYPE_HARDWARE; pe.type = type;
pe.size = sizeof(pe); pe.size = sizeof(pe);
pe.config = event; pe.config = config;
pe.inherit = 1; pe.inherit = 1;
pe.exclude_kernel = 1; pe.exclude_kernel = 1;
pe.exclude_hv = 1; pe.exclude_hv = 1;
fd = perf_event_open(&pe, 0, -1, -1, 0); fd = perf_event_open(&pe, 0, -1, groupLeaderFd, 0);
if (fd == -1) { if (fd < 0 && errno != ENOENT && errno != EINVAL) {
fprintf(stderr, "Error opening leader %llx\n", pe.config); perror(labels.c_str());
exit(EXIT_FAILURE);
} }
} }
int64_t total() { int64_t total() const {
int64_t count; int64_t count;
if (read(fd, &count, sizeof(count)) != sizeof(count)) { if (read(fd, &count, sizeof(count)) != sizeof(count)) {
perror("read instructions from perf"); perror("read instructions from perf");
@@ -202,10 +199,27 @@ struct PerfCounter {
return count; return count;
} }
~PerfCounter() { close(fd); } PerfCounter(PerfCounter &&other)
: fd(std::exchange(other.fd, -1)), labels(std::move(other.labels)) {}
PerfCounter &operator=(PerfCounter &&other) {
fd = std::exchange(other.fd, -1);
labels = std::move(other.labels);
return *this;
}
~PerfCounter() {
if (fd >= 0) {
close(fd);
}
}
bool ok() const { return fd >= 0; }
const std::string &getLabels() const { return labels; }
int getFd() const { return fd; }
private: private:
int fd; int fd;
std::string labels;
static long perf_event_open(struct perf_event_attr *hw_event, pid_t pid, static long perf_event_open(struct perf_event_attr *hw_event, pid_t pid,
int cpu, int group_fd, unsigned long flags) { int cpu, int group_fd, unsigned long flags) {
int ret; int ret;
@@ -214,11 +228,6 @@ private:
return ret; return ret;
} }
}; };
#else
struct PerfCounter {
explicit PerPerfCounter(int) {}
int64_t total() { return 0; }
};
#endif #endif
int main(int argc, char **argv) { int main(int argc, char **argv) {
@@ -233,8 +242,50 @@ int main(int argc, char **argv) {
int metricsCount; int metricsCount;
cs.getMetricsV1(&metrics, &metricsCount); cs.getMetricsV1(&metrics, &metricsCount);
PerfCounter instructions{PERF_COUNT_HW_INSTRUCTIONS}; #ifdef __linux__
PerfCounter cycles{PERF_COUNT_HW_CPU_CYCLES}; PerfCounter instructions{PERF_TYPE_HARDWARE, PERF_COUNT_HW_INSTRUCTIONS};
PerfCounter cycles{PERF_TYPE_HARDWARE, PERF_COUNT_HW_CPU_CYCLES, "",
instructions.getFd()};
std::vector<PerfCounter> cacheCounters;
for (auto [id, idStr] : std::initializer_list<std::pair<int, std::string>>{
{PERF_COUNT_HW_CACHE_L1D, "l1d"},
{PERF_COUNT_HW_CACHE_L1I, "l1i"},
{PERF_COUNT_HW_CACHE_LL, "ll"},
{PERF_COUNT_HW_CACHE_DTLB, "dtlb"},
{PERF_COUNT_HW_CACHE_ITLB, "itlb"},
{PERF_COUNT_HW_CACHE_BPU, "bpu"},
{PERF_COUNT_HW_CACHE_NODE, "node"},
}) {
for (auto [op, opStr] :
std::initializer_list<std::pair<int, std::string>>{
{PERF_COUNT_HW_CACHE_OP_READ, "read"},
{PERF_COUNT_HW_CACHE_OP_WRITE, "write"},
{PERF_COUNT_HW_CACHE_OP_PREFETCH, "prefetch"},
}) {
int groupLeaderFd = -1;
for (auto [result, resultStr] :
std::initializer_list<std::pair<int, std::string>>{
{PERF_COUNT_HW_CACHE_RESULT_MISS, "miss"},
{PERF_COUNT_HW_CACHE_RESULT_ACCESS, "access"},
}) {
auto labels = "{id=\"" + idStr + "\", op=\"" + opStr +
"\", result=\"" + resultStr + "\"}";
cacheCounters.emplace_back(PERF_TYPE_HW_CACHE,
id | (op << 8) | (result << 16), labels,
groupLeaderFd);
if (!cacheCounters.back().ok()) {
cacheCounters.pop_back();
} else {
if (groupLeaderFd == -1) {
groupLeaderFd = cacheCounters.back().getFd();
}
}
}
}
}
#endif
auto w = std::thread{workload, &cs}; auto w = std::thread{workload, &cs};
for (;;) { for (;;) {
@@ -262,6 +313,7 @@ int main(int argc, char **argv) {
"transactions_total "; "transactions_total ";
body += std::to_string(transactions.load(std::memory_order_relaxed)); body += std::to_string(transactions.load(std::memory_order_relaxed));
body += "\n"; body += "\n";
#ifdef __linux__
body += "# HELP instructions_total Total number of instructions\n" body += "# HELP instructions_total Total number of instructions\n"
"# TYPE instructions_total counter\n" "# TYPE instructions_total counter\n"
"instructions_total "; "instructions_total ";
@@ -272,6 +324,13 @@ int main(int argc, char **argv) {
"cycles_total "; "cycles_total ";
body += std::to_string(cycles.total()); body += std::to_string(cycles.total());
body += "\n"; body += "\n";
body += "# HELP cache_event_total Total number of cache events\n"
"# TYPE cache_event_total counter\n";
for (const auto &counter : cacheCounters) {
body += "cache_event_total" + counter.getLabels() + " " +
std::to_string(counter.total()) + "\n";
}
#endif
for (int i = 0; i < metricsCount; ++i) { for (int i = 0; i < metricsCount; ++i) {
body += "# HELP "; body += "# HELP ";

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.