9 Commits

Author SHA1 Message Date
5580f9b71d Run script tests under valgrind too
Some checks failed
Tests / Clang total: 3307, passed: 3307
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / 64 bit versions total: 3307, passed: 3307
Tests / Debug total: 3305, passed: 3305
Tests / SIMD fallback total: 3307, passed: 3307
Tests / Release [gcc] total: 3307, passed: 3307
GNU C Compiler (gcc) |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Release [gcc,aarch64] total: 2458, passed: 2458
Tests / Coverage total: 2487, passed: 2487
Code Coverage #### Project Overview No changes detected, that affect the code coverage. * Line Coverage: 99.03% (1846/1864) * Branch Coverage: 67.65% (1447/2139) * Complexity Density: 0.00 * Lines of Code: 1864 #### Quality Gates Summary Output truncated.
weaselab/conflict-set/pipeline/head There was a failure building this commit
2024-09-14 13:51:45 -07:00
628d16b7e6 Annotate zero argument as maybe_unused 2024-09-14 13:47:08 -07:00
d9e4a7d1b6 Remove bogus assert
Some checks failed
Tests / Clang total: 3296, passed: 3296
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / 64 bit versions total: 3296, passed: 3296
Tests / Debug total: 3294, passed: 3294
Tests / SIMD fallback total: 3296, passed: 3296
Tests / Release [gcc] total: 3296, passed: 3296
GNU C Compiler (gcc) |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |2|0|2|0|:zzz:
Tests / Release [gcc,aarch64] total: 2458, passed: 2458
Tests / Coverage total: 2476, passed: 2476
Code Coverage #### Project Overview No changes detected, that affect the code coverage. * Line Coverage: 98.77% (1841/1864) * Branch Coverage: 67.55% (1445/2139) * Complexity Density: 0.00 * Lines of Code: 1864 #### Quality Gates Summary Output truncated.
weaselab/conflict-set/pipeline/head There was a failure building this commit
2024-09-14 10:58:48 -07:00
52201fa4c7 Use signed compare
Some checks failed
Tests / Clang total: 3296, failed: 38, passed: 3258
Tests / 64 bit versions total: 3296, passed: 3296
Tests / Debug total: 3294, failed: 76, passed: 3218
Tests / SIMD fallback total: 3296, passed: 3296
Tests / Release [gcc] total: 3296, failed: 38, passed: 3258
Tests / Release [gcc,aarch64] total: 2458, passed: 2458
Tests / Coverage total: 2476, failed: 76, passed: 2400
weaselab/conflict-set/pipeline/head There was a failure building this commit
Unsigned compare is incorrect here
2024-09-13 22:55:22 -07:00
0814822d82 avx512 implementations for fixupMaxVersion
Some checks failed
Tests / Clang total: 3296, failed: 1, passed: 3295
Tests / 64 bit versions total: 3296, passed: 3296
Tests / Debug total: 3294, failed: 1, passed: 3293
Tests / SIMD fallback total: 3296, passed: 3296
Tests / Release [gcc] total: 3296, failed: 1, passed: 3295
Tests / Release [gcc,aarch64] total: 2458, passed: 2458
Tests / Coverage total: 2476, failed: 1, passed: 2475
weaselab/conflict-set/pipeline/head There was a failure building this commit
2024-09-13 22:01:56 -07:00
41df2398e8 Add to corpus 2024-09-13 20:35:52 -07:00
84c4d0fcba More usages of getNodeIndexExists
All checks were successful
Tests / Clang total: 3244, passed: 3244
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / 64 bit versions total: 3244, passed: 3244
Tests / Debug total: 3242, passed: 3242
Tests / SIMD fallback total: 3244, passed: 3244
Tests / Release [gcc] total: 3244, passed: 3244
GNU C Compiler (gcc) |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Release [gcc,aarch64] total: 2419, passed: 2419
Tests / Coverage total: 2437, passed: 2437
Code Coverage #### Project Overview No changes detected, that affect the code coverage. * Line Coverage: 99.24% (1823/1837) * Branch Coverage: 68.06% (1447/2126) * Complexity Density: 0.00 * Lines of Code: 1837 #### Quality Gates Summary Output truncated.
weaselab/conflict-set/pipeline/head This commit looks good
2024-09-12 22:22:16 -07:00
6241533dfb Improve codegen for getChildExists(Node{3,16}*, ...) 2024-09-12 22:05:00 -07:00
0abf6a1ecf Improve Node3 search codegen
All checks were successful
Tests / Clang total: 3244, passed: 3244
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / 64 bit versions total: 3244, passed: 3244
Tests / Debug total: 3242, passed: 3242
Tests / SIMD fallback total: 3244, passed: 3244
Tests / Release [gcc] total: 3244, passed: 3244
GNU C Compiler (gcc) |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Release [gcc,aarch64] total: 2419, passed: 2419
Tests / Coverage total: 2437, passed: 2437
Code Coverage #### Project Overview No changes detected, that affect the code coverage. * Line Coverage: 99.23% (1807/1821) * Branch Coverage: 68.22% (1434/2102) * Complexity Density: 0.00 * Lines of Code: 1821 #### Quality Gates Summary Output truncated.
weaselab/conflict-set/pipeline/head This commit looks good
2024-09-12 17:12:41 -07:00
18 changed files with 259 additions and 479 deletions

View File

@@ -24,14 +24,6 @@ repos:
entry: "^#define SHOW_MEMORY 1$"
language: pygrep
types: [c++]
- repo: local
hooks:
- id: sim cache misses check
name: disallow checking in SIM_CACHE_MISSES=1
description: disallow checking in SIM_CACHE_MISSES=1
entry: "^#define SIM_CACHE_MISSES 1$"
language: pygrep
types: [c++]
- repo: https://github.com/shellcheck-py/shellcheck-py
rev: a23f6b85d0fdd5bb9d564e2579e678033debbdff # frozen: v0.10.0.1
hooks:

View File

@@ -248,6 +248,19 @@ if(CMAKE_SOURCE_DIR STREQUAL CMAKE_CURRENT_SOURCE_DIR AND BUILD_TESTING)
add_test(NAME conflict_set_blackbox_${hash} COMMAND driver ${TEST})
endforeach()
find_program(VALGRIND_EXE valgrind)
if(VALGRIND_EXE AND NOT CMAKE_CROSSCOMPILING)
list(LENGTH CORPUS_TESTS len)
math(EXPR last "${len} - 1")
set(partition_size 100)
foreach(i RANGE 0 ${last} ${partition_size})
list(SUBLIST CORPUS_TESTS ${i} ${partition_size} partition)
add_test(NAME conflict_set_blackbox_valgrind_${i}
COMMAND ${VALGRIND_EXE} --error-exitcode=99 --
$<TARGET_FILE:driver> ${partition})
endforeach()
endif()
# scripted tests. Written manually to fill in anything libfuzzer couldn't
# find.
if(NOT CMAKE_CROSSCOMPILING)
@@ -268,19 +281,14 @@ if(CMAKE_SOURCE_DIR STREQUAL CMAKE_CURRENT_SOURCE_DIR AND BUILD_TESTING)
${Python3_EXECUTABLE}
${CMAKE_CURRENT_SOURCE_DIR}/test_conflict_set.py test ${TEST}
--build-dir ${CMAKE_CURRENT_BINARY_DIR})
endforeach()
endif()
find_program(VALGRIND_EXE valgrind)
if(VALGRIND_EXE AND NOT CMAKE_CROSSCOMPILING)
list(LENGTH CORPUS_TESTS len)
math(EXPR last "${len} - 1")
set(partition_size 100)
foreach(i RANGE 0 ${last} ${partition_size})
list(SUBLIST CORPUS_TESTS ${i} ${partition_size} partition)
add_test(NAME conflict_set_blackbox_valgrind_${i}
COMMAND ${VALGRIND_EXE} --error-exitcode=99 --
$<TARGET_FILE:driver> ${partition})
if(VALGRIND_EXE AND NOT CMAKE_CROSSCOMPILING)
add_test(
NAME script_test_${TEST}_valgrind
COMMAND
${VALGRIND_EXE} ${Python3_EXECUTABLE}
${CMAKE_CURRENT_SOURCE_DIR}/test_conflict_set.py test ${TEST}
--build-dir ${CMAKE_CURRENT_BINARY_DIR})
endif()
endforeach()
endif()
@@ -350,11 +358,6 @@ if(CMAKE_SOURCE_DIR STREQUAL CMAKE_CURRENT_SOURCE_DIR AND BUILD_TESTING)
set_target_properties(server_bench PROPERTIES SKIP_BUILD_RPATH ON)
add_executable(interleaving_test InterleavingTest.cpp)
# work around lack of musttail for gcc
if(CMAKE_CXX_COMPILER_ID STREQUAL "GNU" AND CMAKE_BUILD_TYPE STREQUAL "Debug")
target_compile_options(interleaving_test PRIVATE -Og
-foptimize-sibling-calls)
endif()
target_link_libraries(interleaving_test PRIVATE nanobench)
endif()

View File

@@ -48,17 +48,6 @@ limitations under the License.
#endif
#endif
#define SIM_CACHE_MISSES 0
#if SIM_CACHE_MISSES
constexpr void simCacheMiss(void *x) {
if (x) {
_mm_clflush(x);
}
}
#else
constexpr void simCacheMiss(void *) {}
#endif
#include <memcheck.h>
using namespace weaselab;
@@ -777,6 +766,8 @@ private:
int getNodeIndex(Node3 *self, uint8_t index) {
Node3 *n = (Node3 *)self;
assume(n->numChildren >= 1);
assume(n->numChildren <= 3);
for (int i = 0; i < n->numChildren; ++i) {
if (n->index[i] == index) {
return i;
@@ -785,6 +776,18 @@ int getNodeIndex(Node3 *self, uint8_t index) {
return -1;
}
int getNodeIndexExists(Node3 *self, uint8_t index) {
Node3 *n = (Node3 *)self;
assume(n->numChildren >= 1);
assume(n->numChildren <= 3);
for (int i = 0; i < n->numChildren; ++i) {
if (n->index[i] == index) {
return i;
}
}
__builtin_unreachable(); // GCOVR_EXCL_LINE
}
int getNodeIndex(Node16 *self, uint8_t index) {
#ifdef HAS_AVX
@@ -845,31 +848,62 @@ int getNodeIndex(Node16 *self, uint8_t index) {
#endif
}
int getNodeIndexExists(Node16 *self, uint8_t index) {
#ifdef HAS_AVX
__m128i key_vec = _mm_set1_epi8(index);
__m128i indices;
memcpy(&indices, self->index, Node16::kMaxNodes);
__m128i results = _mm_cmpeq_epi8(key_vec, indices);
uint32_t mask = (1 << self->numChildren) - 1;
uint32_t bitfield = _mm_movemask_epi8(results) & mask;
assume(bitfield != 0);
return std::countr_zero(bitfield);
#elif defined(HAS_ARM_NEON)
// Based on
// https://community.arm.com/arm-community-blogs/b/infrastructure-solutions-blog/posts/porting-x86-vector-bitmask-optimizations-to-arm-neon
uint8x16_t indices;
memcpy(&indices, self->index, Node16::kMaxNodes);
// 0xff for each match
uint16x8_t results =
vreinterpretq_u16_u8(vceqq_u8(vdupq_n_u8(index), indices));
assume(self->numChildren <= Node16::kMaxNodes);
uint64_t mask = self->numChildren == 16
? uint64_t(-1)
: (uint64_t(1) << (self->numChildren * 4)) - 1;
// 0xf for each match in valid range
uint64_t bitfield =
vget_lane_u64(vreinterpret_u64_u8(vshrn_n_u16(results, 4)), 0) & mask;
assume(bitfield != 0);
return std::countr_zero(bitfield) / 4;
#else
for (int i = 0; i < self->numChildren; ++i) {
if (self->index[i] == index) {
return i;
}
}
__builtin_unreachable(); // GCOVR_EXCL_LINE
#endif
}
// Precondition - an entry for index must exist in the node
Node *&getChildExists(Node3 *self, uint8_t index) {
auto &result = self->children[getNodeIndex(self, index)];
simCacheMiss(result);
return result;
return self->children[getNodeIndexExists(self, index)];
}
// Precondition - an entry for index must exist in the node
Node *&getChildExists(Node16 *self, uint8_t index) {
auto &result = self->children[getNodeIndex(self, index)];
simCacheMiss(result);
return result;
return self->children[getNodeIndexExists(self, index)];
}
// Precondition - an entry for index must exist in the node
Node *&getChildExists(Node48 *self, uint8_t index) {
assert(self->bitSet.test(index));
auto &result = self->children[self->index[index]];
simCacheMiss(result);
return result;
return self->children[self->index[index]];
}
// Precondition - an entry for index must exist in the node
Node *&getChildExists(Node256 *self, uint8_t index) {
assert(self->bitSet.test(index));
auto &result = self->children[index];
simCacheMiss(result);
return result;
return self->children[index];
}
// Precondition - an entry for index must exist in the node
@@ -904,12 +938,12 @@ InternalVersionT maxVersion(Node *n) {
__builtin_unreachable(); // GCOVR_EXCL_LINE
case Type_Node3: {
auto *n3 = static_cast<Node3 *>(n);
int i = getNodeIndex(n3, index);
int i = getNodeIndexExists(n3, index);
return n3->childMaxVersion[i];
}
case Type_Node16: {
auto *n16 = static_cast<Node16 *>(n);
int i = getNodeIndex(n16, index);
int i = getNodeIndexExists(n16, index);
return n16->childMaxVersion[i];
}
case Type_Node48: {
@@ -937,12 +971,12 @@ InternalVersionT exchangeMaxVersion(Node *n, InternalVersionT newMax) {
__builtin_unreachable(); // GCOVR_EXCL_LINE
case Type_Node3: {
auto *n3 = static_cast<Node3 *>(n);
int i = getNodeIndex(n3, index);
int i = getNodeIndexExists(n3, index);
return std::exchange(n3->childMaxVersion[i], newMax);
}
case Type_Node16: {
auto *n16 = static_cast<Node16 *>(n);
int i = getNodeIndex(n16, index);
int i = getNodeIndexExists(n16, index);
return std::exchange(n16->childMaxVersion[i], newMax);
}
case Type_Node48: {
@@ -971,13 +1005,13 @@ void setMaxVersion(Node *n, InternalVersionT newMax) {
__builtin_unreachable(); // GCOVR_EXCL_LINE
case Type_Node3: {
auto *n3 = static_cast<Node3 *>(n);
int i = getNodeIndex(n3, index);
int i = getNodeIndexExists(n3, index);
n3->childMaxVersion[i] = newMax;
return;
}
case Type_Node16: {
auto *n16 = static_cast<Node16 *>(n);
int i = getNodeIndex(n16, index);
int i = getNodeIndexExists(n16, index);
n16->childMaxVersion[i] = newMax;
return;
}
@@ -1049,7 +1083,6 @@ ChildAndMaxVersion getChildAndMaxVersion(Node3 *self, uint8_t index) {
if (i < 0) {
return {};
}
simCacheMiss(self->children[i]);
return {self->children[i], self->childMaxVersion[i]};
}
ChildAndMaxVersion getChildAndMaxVersion(Node16 *self, uint8_t index) {
@@ -1057,7 +1090,6 @@ ChildAndMaxVersion getChildAndMaxVersion(Node16 *self, uint8_t index) {
if (i < 0) {
return {};
}
simCacheMiss(self->children[i]);
return {self->children[i], self->childMaxVersion[i]};
}
ChildAndMaxVersion getChildAndMaxVersion(Node48 *self, uint8_t index) {
@@ -1065,11 +1097,9 @@ ChildAndMaxVersion getChildAndMaxVersion(Node48 *self, uint8_t index) {
if (i < 0) {
return {};
}
simCacheMiss(self->children[i]);
return {self->children[i], self->childMaxVersion[i]};
}
ChildAndMaxVersion getChildAndMaxVersion(Node256 *self, uint8_t index) {
simCacheMiss(self->children[index]);
return {self->children[index], self->childMaxVersion[index]};
}
@@ -1093,9 +1123,10 @@ ChildAndMaxVersion getChildAndMaxVersion(Node *self, uint8_t index) {
Node *getChildGeq(Node0 *, int) { return nullptr; }
Node *getChildGeq(Node3 *n, int child) {
assume(n->numChildren >= 1);
assume(n->numChildren <= 3);
for (int i = 0; i < n->numChildren; ++i) {
if (n->index[i] >= child) {
simCacheMiss(n->children[i]);
return n->children[i];
}
}
@@ -1114,10 +1145,7 @@ Node *getChildGeq(Node16 *self, int child) {
__m128i results = _mm_cmpeq_epi8(key_vec, _mm_min_epu8(key_vec, indices));
int mask = (1 << self->numChildren) - 1;
uint32_t bitfield = _mm_movemask_epi8(results) & mask;
auto *result =
bitfield == 0 ? nullptr : self->children[std::countr_zero(bitfield)];
simCacheMiss(result);
return result;
return bitfield == 0 ? nullptr : self->children[std::countr_zero(bitfield)];
#elif defined(HAS_ARM_NEON)
uint8x16_t indices;
memcpy(&indices, self->index, sizeof(self->index));
@@ -1153,16 +1181,13 @@ Node *getChildGeq(Node48 *self, int child) {
if (c < 0) {
return nullptr;
}
auto *result = self->children[self->index[c]];
simCacheMiss(result);
return result;
return self->children[self->index[c]];
}
Node *getChildGeq(Node256 *self, int child) {
int c = self->bitSet.firstSetGeq(child);
if (c < 0) {
return nullptr;
}
simCacheMiss(self->children[c]);
return self->children[c];
}
@@ -1186,26 +1211,20 @@ Node *getChildGeq(Node *self, int child) {
// Precondition: self has a child
Node *getFirstChildExists(Node3 *self) {
assert(self->numChildren > 0);
simCacheMiss(self->children[0]);
return self->children[0];
}
// Precondition: self has a child
Node *getFirstChildExists(Node16 *self) {
assert(self->numChildren > 0);
simCacheMiss(self->children[0]);
return self->children[0];
}
// Precondition: self has a child
Node *getFirstChildExists(Node48 *self) {
auto *result = self->children[self->index[self->bitSet.firstSetGeq(0)]];
simCacheMiss(result);
return result;
return self->children[self->index[self->bitSet.firstSetGeq(0)]];
}
// Precondition: self has a child
Node *getFirstChildExists(Node256 *self) {
auto *result = self->children[self->bitSet.firstSetGeq(0)];
simCacheMiss(result);
return result;
return self->children[self->bitSet.firstSetGeq(0)];
}
// Precondition: self has a child
@@ -1585,7 +1604,6 @@ __attribute__((target("avx512f"))) void rezero16(InternalVersionT *vs,
_mm512_sub_epi32(_mm512_loadu_epi32(vs), zvec), _mm512_setzero_epi32());
_mm512_mask_storeu_epi32(vs, m, zvec);
}
__attribute__((target("default")))
#endif
@@ -2452,6 +2470,7 @@ checkMaxBetweenExclusive(Node *n, int begin, int end,
}
__attribute__((target("default")))
#endif
bool checkMaxBetweenExclusive(Node *n, int begin, int end,
InternalVersionT readVersion, ReadContext *tls) {
return checkMaxBetweenExclusiveImpl<false>(n, begin, end, readVersion, tls);
@@ -2891,6 +2910,72 @@ void addPointWrite(Node *&root, std::span<const uint8_t> key,
}
}
#if defined(HAS_AVX) && !defined(__SANITIZE_THREAD__)
__attribute__((target("avx512f"))) InternalVersionT horizontalMaxUpTo16(
InternalVersionT *vs, [[maybe_unused]] InternalVersionT z, int len) {
assume(len <= 16);
#if USE_64_BIT
// Hope it gets vectorized
InternalVersionT max = vs[0];
for (int i = 1; i < len; ++i) {
max = std::max(vs[i], max);
}
return max;
#else
uint32_t zero;
memcpy(&zero, &z, sizeof(zero));
auto zeroVec = _mm512_set1_epi32(zero);
auto max = InternalVersionT(
zero +
_mm512_reduce_max_epi32(_mm512_sub_epi32(
_mm512_mask_loadu_epi32(zeroVec, _mm512_int2mask((1 << len) - 1), vs),
zeroVec)));
return max;
#endif
}
__attribute__((target("default")))
#endif
InternalVersionT
horizontalMaxUpTo16(InternalVersionT *vs, InternalVersionT, int len) {
assume(len <= 16);
InternalVersionT max = vs[0];
for (int i = 1; i < len; ++i) {
max = std::max(vs[i], max);
}
return max;
}
#if defined(HAS_AVX) && !defined(__SANITIZE_THREAD__)
__attribute__((target("avx512f"))) InternalVersionT
horizontalMax16(InternalVersionT *vs, [[maybe_unused]] InternalVersionT z) {
#if USE_64_BIT
// Hope it gets vectorized
InternalVersionT max = vs[0];
for (int i = 1; i < 16; ++i) {
max = std::max(vs[i], max);
}
return max;
#else
uint32_t zero;
memcpy(&zero, &z, sizeof(zero));
auto zeroVec = _mm512_set1_epi32(zero);
return InternalVersionT(zero + _mm512_reduce_max_epi32(_mm512_sub_epi32(
_mm512_loadu_epi32(vs), zeroVec)));
#endif
}
__attribute__((target("default")))
#endif
InternalVersionT
horizontalMax16(InternalVersionT *vs, InternalVersionT) {
InternalVersionT max = vs[0];
for (int i = 1; i < 16; ++i) {
max = std::max(vs[i], max);
}
return max;
}
// Precondition: `node->entryPresent`, and node is not the root
void fixupMaxVersion(Node *node, WriteContext *tls) {
assert(node->parent);
@@ -2902,15 +2987,13 @@ void fixupMaxVersion(Node *node, WriteContext *tls) {
break;
case Type_Node3: {
auto *self3 = static_cast<Node3 *>(node);
for (int i = 0; i < self3->numChildren; ++i) {
max = std::max(self3->childMaxVersion[i], max);
}
max = std::max(max, horizontalMaxUpTo16(self3->childMaxVersion, tls->zero,
self3->numChildren));
} break;
case Type_Node16: {
auto *self16 = static_cast<Node16 *>(node);
for (int i = 0; i < self16->numChildren; ++i) {
max = std::max(self16->childMaxVersion[i], max);
}
max = std::max(max, horizontalMaxUpTo16(self16->childMaxVersion, tls->zero,
self16->numChildren));
} break;
case Type_Node48: {
auto *self48 = static_cast<Node48 *>(node);
@@ -2920,9 +3003,7 @@ void fixupMaxVersion(Node *node, WriteContext *tls) {
} break;
case Type_Node256: {
auto *self256 = static_cast<Node256 *>(node);
for (auto v : self256->maxOfMax) {
max = std::max(v, max);
}
max = std::max(max, horizontalMax16(self256->childMaxVersion, tls->zero));
} break;
default: // GCOVR_EXCL_LINE
__builtin_unreachable(); // GCOVR_EXCL_LINE
@@ -3045,288 +3126,34 @@ Node *firstGeqPhysical(Node *n, const std::span<const uint8_t> key) {
}
}
#ifndef __has_attribute
#define __has_attribute(x) 0
#endif
#if __has_attribute(musttail)
#define MUSTTAIL __attribute__((musttail))
#else
#define MUSTTAIL
#endif
#if __has_attribute(preserve_none)
#define CONTINUATION_CALLING_CONVENTION __attribute__((preserve_none))
#else
#define CONTINUATION_CALLING_CONVENTION
#endif
typedef CONTINUATION_CALLING_CONVENTION void (*continuation)(struct CheckAll *,
int64_t prevJob,
int64_t job,
int64_t started,
int64_t count);
// State relevant to a particular query
struct CheckJob {
void setResult(bool ok) {
*result = ok ? ConflictSet::Commit : ConflictSet::Conflict;
}
[[nodiscard]] continuation init(const ConflictSet::ReadRange *read,
ConflictSet::Result *result, Node *root,
int64_t oldestVersionFullPrecision,
ReadContext *tls);
Node *n;
ChildAndMaxVersion childAndVersion;
std::span<const uint8_t> begin;
InternalVersionT readVersion;
ConflictSet::Result *result;
};
// State relevant to all queries
struct CheckAll {
constexpr static int kConcurrent = 32;
CheckJob inProgress[kConcurrent];
continuation next[kConcurrent];
int nextJob[kConcurrent];
Node *root;
int64_t oldestVersionFullPrecision;
ReadContext *tls;
const ConflictSet::ReadRange *queries;
ConflictSet::Result *results;
};
CONTINUATION_CALLING_CONVENTION void keepGoing(CheckAll *context,
int64_t prevJob, int64_t job,
int64_t started, int64_t count) {
prevJob = job;
job = context->nextJob[job];
MUSTTAIL return context->next[job](context, prevJob, job, started, count);
}
CONTINUATION_CALLING_CONVENTION void complete(CheckAll *context,
int64_t prevJob, int64_t job,
int64_t started, int64_t count) {
if (started == count) {
if (prevJob == job) {
return;
}
context->nextJob[prevJob] = context->nextJob[job];
job = prevJob;
} else {
int temp = started++;
context->next[job] = context->inProgress[job].init(
context->queries + temp, context->results + temp, context->root,
context->oldestVersionFullPrecision, context->tls);
}
MUSTTAIL return keepGoing(context, prevJob, job, started, count);
}
namespace check_point_read_state_machine {
CONTINUATION_CALLING_CONVENTION void
down_left_spine(struct CheckAll *, int64_t prevJob, int64_t job,
int64_t started, int64_t count);
CONTINUATION_CALLING_CONVENTION void iter(struct CheckAll *, int64_t prevJob,
int64_t job, int64_t started,
int64_t count);
CONTINUATION_CALLING_CONVENTION void begin(struct CheckAll *, int64_t prevJob,
int64_t job, int64_t started,
int64_t count);
void begin(struct CheckAll *context, int64_t prevJob, int64_t job,
int64_t started, int64_t count) {
++context->tls->point_read_accum;
#if DEBUG_VERBOSE && !defined(NDEBUG)
fprintf(stderr, "Check point read: %s\n", printable(key).c_str());
#endif
auto *j = context->inProgress + job;
if (j->begin.size() == 0) {
if (j->n->entryPresent) {
j->setResult(j->n->entry.pointVersion <= j->readVersion);
MUSTTAIL return complete(context, prevJob, job, started, count);
}
j->n = getFirstChildExists(j->n);
context->next[job] = down_left_spine;
__builtin_prefetch(j->n);
MUSTTAIL return keepGoing(context, prevJob, job, started, count);
}
j->childAndVersion = getChildAndMaxVersion(j->n, j->begin[0]);
context->next[job] = iter;
__builtin_prefetch(j->childAndVersion.child);
MUSTTAIL return keepGoing(context, prevJob, job, started, count);
}
void iter(struct CheckAll *context, int64_t prevJob, int64_t job,
int64_t started, int64_t count) {
auto *j = context->inProgress + job;
if (j->childAndVersion.child == nullptr) {
auto c = getChildGeq(j->n, j->begin[0]);
if (c != nullptr) {
j->n = c;
context->next[job] = down_left_spine;
__builtin_prefetch(j->n);
MUSTTAIL return keepGoing(context, prevJob, job, started, count);
} else {
j->n = nextSibling(j->n);
if (j->n == nullptr) {
j->setResult(true);
MUSTTAIL return complete(context, prevJob, job, started, count);
}
context->next[job] = down_left_spine;
__builtin_prefetch(j->n);
MUSTTAIL return keepGoing(context, prevJob, job, started, count);
}
}
j->n = j->childAndVersion.child;
j->begin = j->begin.subspan(1, j->begin.size() - 1);
if (j->n->partialKeyLen > 0) {
int commonLen = std::min<int>(j->n->partialKeyLen, j->begin.size());
int i = longestCommonPrefix(j->n->partialKey(), j->begin.data(), commonLen);
if (i < commonLen) {
auto c = j->n->partialKey()[i] <=> j->begin[i];
if (c > 0) {
context->next[job] = down_left_spine;
MUSTTAIL return down_left_spine(context, prevJob, job, started, count);
} else {
j->n = nextSibling(j->n);
if (j->n == nullptr) {
j->setResult(true);
MUSTTAIL return complete(context, prevJob, job, started, count);
}
context->next[job] = down_left_spine;
__builtin_prefetch(j->n);
MUSTTAIL return keepGoing(context, prevJob, job, started, count);
}
}
if (commonLen == j->n->partialKeyLen) {
// partial key matches
j->begin = j->begin.subspan(commonLen, j->begin.size() - commonLen);
} else if (j->n->partialKeyLen > int(j->begin.size())) {
// n is the first physical node greater than remaining, and there's no
// eq node
context->next[job] = down_left_spine;
MUSTTAIL return down_left_spine(context, prevJob, job, started, count);
}
}
if (j->childAndVersion.maxVersion <= j->readVersion) {
++context->tls->point_read_short_circuit_accum;
j->setResult(true);
MUSTTAIL return complete(context, prevJob, job, started, count);
}
++context->tls->point_read_iterations_accum;
if (j->begin.size() == 0) {
if (j->n->entryPresent) {
j->setResult(j->n->entry.pointVersion <= j->readVersion);
MUSTTAIL return complete(context, prevJob, job, started, count);
}
j->n = getFirstChildExists(j->n);
context->next[job] = down_left_spine;
__builtin_prefetch(j->n);
MUSTTAIL return keepGoing(context, prevJob, job, started, count);
}
j->childAndVersion = getChildAndMaxVersion(j->n, j->begin[0]);
__builtin_prefetch(j->childAndVersion.child);
// j->next is already iter
MUSTTAIL return keepGoing(context, prevJob, job, started, count);
}
void down_left_spine(struct CheckAll *context, int64_t prevJob, int64_t job,
int64_t started, int64_t count) {
auto *j = context->inProgress + job;
if (j->n->entryPresent) {
j->setResult(j->n->entry.rangeVersion <= j->readVersion);
MUSTTAIL return complete(context, prevJob, job, started, count);
}
j->n = getFirstChildExists(j->n);
__builtin_prefetch(j->n);
// j->next is already down_left_spine
MUSTTAIL return keepGoing(context, prevJob, job, started, count);
}
} // namespace check_point_read_state_machine
continuation CheckJob::init(const ConflictSet::ReadRange *read,
ConflictSet::Result *result, Node *root,
int64_t oldestVersionFullPrecision,
ReadContext *tls) {
auto begin = std::span<const uint8_t>(read->begin.p, read->begin.len);
auto end = std::span<const uint8_t>(read->end.p, read->end.len);
if (read->readVersion < oldestVersionFullPrecision) {
*result = ConflictSet::TooOld;
return complete;
} else if (end.size() == 0) {
this->begin = begin;
this->n = root;
this->readVersion = InternalVersionT(read->readVersion);
this->result = result;
return check_point_read_state_machine::begin;
// *result =
// checkPointRead(root, begin, InternalVersionT(read->readVersion), tls)
// ? ConflictSet::Commit
// : ConflictSet::Conflict;
// return complete;
} else {
*result = checkRangeRead(root, begin, end,
InternalVersionT(read->readVersion), tls)
? ConflictSet::Commit
: ConflictSet::Conflict;
return complete;
}
}
struct __attribute__((visibility("hidden"))) ConflictSet::Impl {
void check(const ReadRange *reads, Result *result, int count) {
assert(oldestVersionFullPrecision >=
newestVersionFullPrecision - kNominalVersionWindow);
if (count == 0) {
return;
}
ReadContext tls;
tls.impl = this;
int64_t check_byte_accum = 0;
CheckAll context;
context.oldestVersionFullPrecision = oldestVersionFullPrecision;
context.queries = reads;
context.results = result;
context.root = root;
context.tls = &tls;
int64_t started = std::min(context.kConcurrent, count);
for (int i = 0; i < started; i++) {
context.next[i] = context.inProgress[i].init(
reads + i, result + i, root, oldestVersionFullPrecision, &tls);
context.nextJob[i] = i + 1;
}
context.nextJob[started - 1] = 0;
int prevJob = started - 1;
int job = 0;
context.next[job](&context, prevJob, job, started, count);
for (int i = 0; i < count; ++i) {
assert(reads[i].readVersion >= 0);
assert(reads[i].readVersion <= newestVersionFullPrecision);
const auto &r = reads[i];
check_byte_accum += r.begin.len + r.end.len;
auto begin = std::span<const uint8_t>(r.begin.p, r.begin.len);
auto end = std::span<const uint8_t>(r.end.p, r.end.len);
assert(oldestVersionFullPrecision >=
newestVersionFullPrecision - kNominalVersionWindow);
result[i] =
reads[i].readVersion < oldestVersionFullPrecision ? TooOld
: (end.size() > 0
? checkRangeRead(root, begin, end,
InternalVersionT(reads[i].readVersion), &tls)
: checkPointRead(root, begin,
InternalVersionT(reads[i].readVersion), &tls))
? Commit
: Conflict;
tls.commits_accum += result[i] == Commit;
tls.conflicts_accum += result[i] == Conflict;
tls.too_olds_accum += result[i] == TooOld;
}
point_read_total.add(tls.point_read_accum);
prefix_read_total.add(tls.prefix_read_accum);
range_read_total.add(tls.range_read_accum);
@@ -4268,6 +4095,24 @@ template <int kN> void benchScan2() {
});
}
void benchHorizontal16() {
ankerl::nanobench::Bench bench;
InternalVersionT vs[16];
for (int i = 0; i < 16; ++i) {
vs[i] = InternalVersionT(rand() % 1000 + 1000);
}
#if !USE_64_BIT
InternalVersionT::zero = InternalVersionT(rand() % 1000);
#endif
bench.run("horizontal16", [&]() {
bench.doNotOptimizeAway(horizontalMax16(vs, InternalVersionT::zero));
});
int x = rand() % 15 + 1;
bench.run("horizontalUpTo16", [&]() {
bench.doNotOptimizeAway(horizontalMaxUpTo16(vs, InternalVersionT::zero, x));
});
}
void benchLCP(int len) {
ankerl::nanobench::Bench bench;
std::vector<uint8_t> lhs(len);
@@ -4300,11 +4145,7 @@ void printTree() {
debugPrintDot(stdout, cs.root, &cs);
}
int main(void) {
for (int i = 0; i < 256; ++i) {
benchLCP(i);
}
}
int main(void) { benchHorizontal16(); }
#endif
#ifdef ENABLE_FUZZ

View File

@@ -22,6 +22,9 @@ void *stepJob(Job *j) {
return done ? nullptr : (void *)stepJob;
}
// So we can look at the disassembly more easily
extern "C" {
void sequential(Job **jobs, int count) {
for (int i = 0; i < count; ++i) {
do {
@@ -91,87 +94,6 @@ void interleaveBoundedCyclicList(Job **jobs, int count) {
}
}
#ifndef __has_attribute
#define __has_attribute(x) 0
#endif
#if __has_attribute(musttail)
#define MUSTTAIL __attribute__((musttail))
#else
#define MUSTTAIL
#endif
struct Context {
constexpr static int kConcurrent = 32;
Job **jobs;
Job *inProgress[kConcurrent];
void (*continuation[kConcurrent])(Context *, int64_t prevJob, int64_t job,
int64_t started, int64_t count);
int nextJob[kConcurrent];
};
void keepGoing(Context *context, int64_t prevJob, int64_t job, int64_t started,
int64_t count) {
prevJob = job;
job = context->nextJob[job];
MUSTTAIL return context->continuation[job](context, prevJob, job, started,
count);
}
void stepJobTailCall(Context *context, int64_t prevJob, int64_t job,
int64_t started, int64_t count);
void complete(Context *context, int64_t prevJob, int64_t job, int64_t started,
int64_t count) {
if (started == count) {
if (prevJob == job) {
return;
}
context->nextJob[prevJob] = context->nextJob[job];
job = prevJob;
} else {
context->inProgress[job] = context->jobs[started++];
context->continuation[job] = stepJobTailCall;
}
prevJob = job;
job = context->nextJob[job];
MUSTTAIL return context->continuation[job](context, prevJob, job, started,
count);
}
void stepJobTailCall(Context *context, int64_t prevJob, int64_t job,
int64_t started, int64_t count) {
auto *j = context->inProgress[job];
auto done = --(*j->input) == 0;
#ifdef __x86_64__
_mm_clflush(j->input);
#endif
if (done) {
MUSTTAIL return complete(context, prevJob, job, started, count);
} else {
context->continuation[job] = stepJobTailCall;
MUSTTAIL return keepGoing(context, prevJob, job, started, count);
}
}
void useTailCalls(Job **jobs, int count) {
if (count == 0) {
return;
}
Context context;
context.jobs = jobs;
int64_t started = std::min(Context::kConcurrent, count);
for (int i = 0; i < started; i++) {
context.inProgress[i] = jobs[i];
context.nextJob[i] = i + 1;
context.continuation[i] = stepJobTailCall;
}
context.nextJob[started - 1] = 0;
int prevJob = started - 1;
int job = 0;
return context.continuation[job](&context, prevJob, job, started, count);
}
void interleaveCyclicList(Job **jobs, int count) {
auto *nextJob = (int *)alloca(sizeof(int) * count);
@@ -195,11 +117,12 @@ void interleaveCyclicList(Job **jobs, int count) {
job = nextJob[job];
}
}
}
int main() {
ankerl::nanobench::Bench bench;
constexpr int kNumJobs = 10000;
constexpr int kNumJobs = 100;
bench.relative(true);
Job jobs[kNumJobs];
@@ -217,7 +140,6 @@ int main() {
for (auto [scheduler, name] :
{std::make_pair(sequentialNoFuncPtr, "sequentialNoFuncPtr"),
std::make_pair(sequential, "sequential"),
std::make_pair(useTailCalls, "useTailCalls"),
std::make_pair(interleaveSwapping, "interleavingSwapping"),
std::make_pair(interleaveBoundedCyclicList,
"interleaveBoundedCyclicList"),

View File

@@ -1,5 +1,4 @@
#include <atomic>
#include <cstdint>
#include <errno.h>
#include <netdb.h>
#include <stdio.h>
@@ -22,55 +21,78 @@
std::atomic<int64_t> transactions;
constexpr int kBaseSearchDepth = 115;
constexpr int kWindowSize = 10000000;
constexpr int kNumPrefixes = 250000;
std::string makeKey(int64_t num, int suffixLen) {
std::string numToKey(int64_t num) {
std::string result;
result.resize(sizeof(int64_t) + suffixLen);
result.resize(kBaseSearchDepth + sizeof(int64_t));
memset(result.data(), 0, kBaseSearchDepth);
int64_t be = __builtin_bswap64(num);
memcpy(result.data(), &be, sizeof(int64_t));
memset(result.data() + sizeof(int64_t), 0, suffixLen);
memcpy(result.data() + kBaseSearchDepth, &be, sizeof(int64_t));
return result;
}
void workload(weaselab::ConflictSet *cs) {
int64_t version = kWindowSize;
for (int i = 0; i < kNumPrefixes; ++i) {
for (int j = 0; j < 50; ++j) {
weaselab::ConflictSet::WriteRange wr;
auto k = makeKey(i, j);
wr.begin.p = (const uint8_t *)k.data();
wr.begin.len = k.size();
wr.end.len = 0;
cs->addWrites(&wr, 1, version);
}
}
++version;
for (int i = 0; i < kNumPrefixes; ++i) {
weaselab::ConflictSet::WriteRange wr;
auto k = makeKey(i, 50);
wr.begin.p = (const uint8_t *)k.data();
wr.begin.len = k.size();
wr.end.len = 0;
cs->addWrites(&wr, 1, version);
}
std::vector<weaselab::ConflictSet::Result> results(10);
cs->addWrites(nullptr, 0, version);
for (;; transactions.fetch_add(1, std::memory_order_relaxed)) {
std::vector<std::string> keys(10);
for (auto &k : keys) {
k = makeKey(rand() % kNumPrefixes, 49);
// Reads
{
auto beginK = numToKey(version - kWindowSize);
auto endK = numToKey(version - 1);
auto pointRv = version - kWindowSize + rand() % kWindowSize + 1;
auto pointK = numToKey(pointRv);
weaselab::ConflictSet::ReadRange reads[] = {
{
{(const uint8_t *)pointK.data(), int(pointK.size())},
{nullptr, 0},
pointRv,
},
{
{(const uint8_t *)beginK.data(), int(beginK.size())},
{(const uint8_t *)endK.data(), int(endK.size())},
version - 2,
},
};
weaselab::ConflictSet::Result result[sizeof(reads) / sizeof(reads[0])];
cs->check(reads, result, sizeof(reads) / sizeof(reads[0]));
// for (int i = 0; i < sizeof(reads) / sizeof(reads[0]); ++i) {
// if (result[i] != weaselab::ConflictSet::Commit) {
// fprintf(stderr, "Unexpected conflict: [%s, %s) @ %" PRId64 "\n",
// printable(reads[i].begin).c_str(),
// printable(reads[i].end).c_str(), reads[i].readVersion);
// abort();
// }
// }
}
std::vector<weaselab::ConflictSet::ReadRange> reads(10);
for (int i = 0; i < reads.size(); ++i) {
reads[i].begin.p = (const uint8_t *)(keys[i].data());
reads[i].begin.len = keys[i].size();
reads[i].end.len = 0;
reads[i].readVersion = version - 1;
// Writes
{
weaselab::ConflictSet::WriteRange w;
auto k = numToKey(version);
w.begin.p = (const uint8_t *)k.data();
w.end.len = 0;
if (version % (kWindowSize / 2) == 0) {
for (int l = 0; l <= k.size(); ++l) {
w.begin.len = l;
cs->addWrites(&w, 1, version);
}
} else {
w.begin.len = k.size();
cs->addWrites(&w, 1, version);
int64_t beginN = version - kWindowSize + rand() % kWindowSize;
auto b = numToKey(beginN);
auto e = numToKey(beginN + 1000);
w.begin.p = (const uint8_t *)b.data();
w.begin.len = b.size();
w.end.p = (const uint8_t *)e.data();
w.end.len = e.size();
cs->addWrites(&w, 1, version);
}
}
cs->check(reads.data(), results.data(), 10);
// GC
cs->setOldestVersion(version - kWindowSize);
++version;
}
}

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.