38 Commits

Author SHA1 Message Date
ceecc62a63 Disable freelist for macos
All checks were successful
Tests / 64 bit versions total: 8220, passed: 8220
Tests / Debug total: 8218, passed: 8218
Tests / SIMD fallback total: 8220, passed: 8220
Tests / Release [clang] total: 8220, passed: 8220
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / gcc total: 8220, passed: 8220
GNU C Compiler (gcc) |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Release [clang,aarch64] total: 5446, passed: 5446
Tests / Coverage total: 5497, passed: 5497
Code Coverage #### Project Overview No changes detected, that affect the code coverage. * Line Coverage: 97.69% (3168/3243) * Branch Coverage: 42.25% (19291/45654) * Complexity Density: 0.00 * Lines of Code: 3243 #### Quality Gates Summary Output truncated.
weaselab/conflict-set/pipeline/head This commit looks good
It's faster on my mac m1 without the freelist
2024-11-19 14:28:15 -08:00
80f0697e79 Fix arm detection on macos 2024-11-19 13:24:13 -08:00
ce23d3995c Fix README
All checks were successful
Tests / 64 bit versions total: 8220, passed: 8220
Tests / Debug total: 8218, passed: 8218
Tests / SIMD fallback total: 8220, passed: 8220
Tests / Release [clang] total: 8220, passed: 8220
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / gcc total: 8220, passed: 8220
GNU C Compiler (gcc) |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Release [clang,aarch64] total: 5446, passed: 5446
Tests / Coverage total: 5497, passed: 5497
Code Coverage #### Project Overview No changes detected, that affect the code coverage. * Line Coverage: 97.69% (3166/3241) * Branch Coverage: 42.26% (19263/45584) * Complexity Density: 0.00 * Lines of Code: 3241 #### Quality Gates Summary Output truncated.
weaselab/conflict-set/pipeline/head This commit looks good
2024-11-19 09:02:16 -08:00
6f899e063b Update readme
Some checks failed
Tests / 64 bit versions total: 8220, passed: 8220
weaselab/conflict-set/pipeline/head Something is wrong with the build of this commit
2024-11-18 14:31:56 -08:00
e5b9c03e77 Restore change that hurt codegen
I don't understand, but this is necessary for good codegen somehow
2024-11-18 14:29:55 -08:00
a158d375f5 Add script for updating readme 2024-11-18 14:05:14 -08:00
ee5a84cd7b Remove dead stores
All checks were successful
Tests / 64 bit versions total: 8220, passed: 8220
Tests / Debug total: 8218, passed: 8218
Tests / SIMD fallback total: 8220, passed: 8220
Tests / Release [clang] total: 8220, passed: 8220
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / gcc total: 8220, passed: 8220
GNU C Compiler (gcc) |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Release [clang,aarch64] total: 5446, passed: 5446
Tests / Coverage total: 5497, passed: 5497
Code Coverage #### Project Overview No changes detected, that affect the code coverage. * Line Coverage: 97.69% (3165/3240) * Branch Coverage: 42.26% (19263/45585) * Complexity Density: 0.00 * Lines of Code: 3240 #### Quality Gates Summary Output truncated.
weaselab/conflict-set/pipeline/head This commit looks good
2024-11-15 17:03:29 -08:00
33f14e3d9b Remove unused header 2024-11-15 16:49:21 -08:00
77262ee2d3 Fix some grammar in a comment
All checks were successful
Tests / 64 bit versions total: 8220, passed: 8220
Tests / Debug total: 8218, passed: 8218
Tests / SIMD fallback total: 8220, passed: 8220
Tests / Release [clang] total: 8220, passed: 8220
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / gcc total: 8220, passed: 8220
GNU C Compiler (gcc) |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Release [clang,aarch64] total: 5446, passed: 5446
Tests / Coverage total: 5497, passed: 5497
Code Coverage #### Project Overview No changes detected, that affect the code coverage. * Line Coverage: 97.69% (3167/3242) * Branch Coverage: 42.26% (19269/45597) * Complexity Density: 0.00 * Lines of Code: 3242 #### Quality Gates Summary Output truncated.
weaselab/conflict-set/pipeline/head This commit looks good
2024-11-15 16:47:31 -08:00
9945998e05 Remove unused code in Internal.h 2024-11-15 16:30:20 -08:00
2777e016ff Be more consistent about TaggedNodePointer vs Node* 2024-11-15 16:30:05 -08:00
661ffcd843 Explain purpose for prefetches in getFirstChild 2024-11-15 16:29:42 -08:00
3a34d3cecb Minor style improvements 2024-11-15 16:29:17 -08:00
189c73e3bd Reduce Node3 size by 8 2024-11-15 15:53:18 -08:00
35987030fc Add to corpus 2024-11-15 15:47:32 -08:00
0621741ec3 Update README benchmarks 2024-11-15 13:12:47 -08:00
f5ec9f726a Remove getFirstChildExists
Once we know the type, for Node3 and higher we know the first child
exists anyway
2024-11-15 13:05:11 -08:00
552fc11c5d Prefetch second child to improve scan performance 2024-11-15 12:45:01 -08:00
71ace9cc55 Make more range reads commit in server_bench 2024-11-15 11:35:55 -08:00
bcf459304f Improve cpu performance for workload generation in server_bench 2024-11-14 17:16:57 -08:00
f403c78410 Add test that exercises "too large removal buffer" code
All checks were successful
Tests / 64 bit versions total: 8099, passed: 8099
Tests / Debug total: 8097, passed: 8097
Tests / SIMD fallback total: 8099, passed: 8099
Tests / Release [clang] total: 8099, passed: 8099
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / gcc total: 8099, passed: 8099
GNU C Compiler (gcc) |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Release [clang,aarch64] total: 5366, passed: 5366
Tests / Coverage total: 5416, passed: 5416
Code Coverage #### Project Overview No changes detected, that affect the code coverage. * Line Coverage: 97.61% (3182/3260) * Branch Coverage: 42.26% (19285/45639) * Complexity Density: 0.00 * Lines of Code: 3260 #### Quality Gates Summary Output truncated.
weaselab/conflict-set/pipeline/head This commit looks good
2024-11-14 16:40:47 -08:00
08958d4109 Remove the reverse step for saving scan search path 2024-11-14 16:27:25 -08:00
dcc5275ec9 Isolate conflict-set on one thread in server_bench
Some checks failed
Tests / 64 bit versions total: 8097, passed: 8097
Tests / Debug total: 8095, passed: 8095
Tests / SIMD fallback total: 8097, passed: 8097
Tests / Release [clang] total: 8097, passed: 8097
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / gcc total: 8097, passed: 8097
GNU C Compiler (gcc) |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Release [clang,aarch64] total: 5366, passed: 5366
Tests / Coverage total: 5414, passed: 5414
weaselab/conflict-set/pipeline/head Something is wrong with the build of this commit
2024-11-13 22:46:11 -08:00
c5ef843f9e Add range reads to server_bench
All checks were successful
Tests / 64 bit versions total: 8097, passed: 8097
Tests / Debug total: 8095, passed: 8095
Tests / SIMD fallback total: 8097, passed: 8097
Tests / Release [clang] total: 8097, passed: 8097
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / gcc total: 8097, passed: 8097
GNU C Compiler (gcc) |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Release [clang,aarch64] total: 5366, passed: 5366
Tests / Coverage total: 5414, passed: 5414
Code Coverage #### Project Overview No changes detected, that affect the code coverage. * Line Coverage: 97.67% (3149/3224) * Branch Coverage: 42.26% (19259/45578) * Complexity Density: 0.00 * Lines of Code: 3224 #### Quality Gates Summary Output truncated.
weaselab/conflict-set/pipeline/head This commit looks good
2024-11-13 14:06:50 -08:00
b78e817e24 Only set HAS_AVX for x86_64
All checks were successful
Tests / 64 bit versions total: 8097, passed: 8097
Tests / Debug total: 8095, passed: 8095
Tests / SIMD fallback total: 8097, passed: 8097
Tests / Release [clang] total: 8097, passed: 8097
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / gcc total: 8097, passed: 8097
GNU C Compiler (gcc) |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Release [clang,aarch64] total: 5366, passed: 5366
Tests / Coverage total: 5414, passed: 5414
Code Coverage #### Project Overview No changes detected, that affect the code coverage. * Line Coverage: 97.67% (3149/3224) * Branch Coverage: 42.26% (19259/45578) * Complexity Density: 0.00 * Lines of Code: 3224 #### Quality Gates Summary Output truncated.
weaselab/conflict-set/pipeline/head This commit looks good
2024-11-12 18:16:30 -08:00
9c82f17e20 Enable valgrind annotations for gcc build in jenkins
Some checks failed
Tests / 64 bit versions total: 8097, passed: 8097
weaselab/conflict-set/pipeline/head Something is wrong with the build of this commit
2024-11-12 18:13:32 -08:00
665a9313a4 Valgrind annotations for new free list 2024-11-12 18:11:09 -08:00
6e66202d5e Add to corpus
Some checks failed
Tests / 64 bit versions total: 8097, passed: 8097
Tests / Debug total: 8095, passed: 8095
Tests / SIMD fallback total: 8097, passed: 8097
Tests / Release [clang] total: 8097, passed: 8097
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Release [gcc] total: 8097, passed: 8097
GNU C Compiler (gcc) |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
weaselab/conflict-set/pipeline/head There was a failure building this commit
2024-11-12 17:50:27 -08:00
a92271a205 Build with -Wunused-variable 2024-11-12 17:50:27 -08:00
0dbfb4deae Detect simd headers in c++ instead of cmake 2024-11-12 17:50:27 -08:00
6e229b6b36 Skip sorting if already sorted for skip list 2024-11-12 17:50:27 -08:00
2200de11c8 Point read/write workload for server_bench 2024-11-12 17:50:27 -08:00
b37feb58dd Require musttail and preserve_none for interleaved 2024-11-12 17:50:27 -08:00
94a4802824 Don't do hardening check if cross compiling 2024-11-12 17:50:27 -08:00
707dbdb391 Build binaries compatible with cf-protection 2024-11-12 17:50:27 -08:00
bdd343bb57 Use llvm-objcopy if using clang and it's available
This works around a weird error I was getting when trying to link a
translation unit that included Internal.h with libconflict-set-static.a
with clang + gnu objcopy
2024-11-12 17:50:27 -08:00
7b31bd5efe Use llvm 19 for macos package
Some checks failed
Tests / 64 bit versions total: 7949, passed: 7949
Tests / Debug total: 7947, passed: 7947
Tests / SIMD fallback total: 7949, passed: 7949
Tests / Release [clang] total: 7949, passed: 7949
Clang |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Release [gcc] total: 7949, passed: 7949
GNU C Compiler (gcc) |Total|New|Outstanding|Fixed|Trend |:-:|:-:|:-:|:-:|:-: |0|0|0|0|:clap:
Tests / Release [clang,aarch64] total: 5268, passed: 5268
weaselab/conflict-set/pipeline/head There was a failure building this commit
2024-11-10 21:47:26 -08:00
e255e1a926 Add missing symbol imports for macos 2024-11-10 21:05:42 -08:00
103 changed files with 528 additions and 242 deletions

View File

@@ -57,6 +57,7 @@ ConflictSet::ReadRange prefixRange(Arena &arena, TrivialSpan key) {
void benchConflictSet() {
ankerl::nanobench::Bench bench;
bench.minEpochIterations(10000);
ConflictSet cs{0};
bench.batch(kOpsPerTx);

View File

@@ -31,11 +31,24 @@ if(NOT CMAKE_BUILD_TYPE AND NOT CMAKE_CONFIGURATION_TYPES)
"MinSizeRel" "RelWithDebInfo")
endif()
add_compile_options(-fdata-sections -ffunction-sections -Wswitch-enum
-Werror=switch-enum -fPIC)
add_compile_options(
-Werror=switch-enum
-Wswitch-enum
-Wunused-variable
-fPIC
-fdata-sections
-ffunction-sections
-fno-jump-tables # https://github.com/llvm/llvm-project/issues/54247
)
if(CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
add_link_options("-Wno-unused-command-line-argument")
find_program(LLVM_OBJCOPY llvm-objcopy)
if(LLVM_OBJCOPY)
set(CMAKE_OBJCOPY
${LLVM_OBJCOPY}
CACHE FILEPATH "path to objcopy binary" FORCE)
endif()
endif()
if(CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
@@ -56,6 +69,22 @@ if(HAS_FULL_RELRO)
endif()
cmake_pop_check_state()
if(CMAKE_SYSTEM_PROCESSOR STREQUAL aarch64 OR CMAKE_SYSTEM_PROCESSOR STREQUAL
arm64)
add_compile_options(-mbranch-protection=standard)
else()
add_compile_options(-fcf-protection)
set(rewrite_endbr_flags "-fuse-ld=mold;LINKER:-z,rewrite-endbr")
cmake_push_check_state()
list(APPEND CMAKE_REQUIRED_LINK_OPTIONS ${rewrite_endbr_flags})
check_cxx_source_compiles("int main(){}" HAS_REWRITE_ENDBR FAIL_REGEX
"warning:")
if(HAS_REWRITE_ENDBR)
add_link_options(${rewrite_endbr_flags})
endif()
cmake_pop_check_state()
endif()
set(version_script_flags
LINKER:--version-script=${CMAKE_CURRENT_SOURCE_DIR}/linker.map)
cmake_push_check_state()
@@ -81,19 +110,11 @@ else()
add_link_options(-Wl,--gc-sections)
endif()
if(NOT USE_SIMD_FALLBACK)
cmake_push_check_state()
list(APPEND CMAKE_REQUIRED_FLAGS -mavx)
check_include_file_cxx("immintrin.h" HAS_AVX)
if(HAS_AVX)
if(USE_SIMD_FALLBACK)
add_compile_definitions(USE_SIMD_FALLBACK)
else()
if(CMAKE_SYSTEM_PROCESSOR STREQUAL x86_64)
add_compile_options(-mavx)
add_compile_definitions(HAS_AVX)
endif()
cmake_pop_check_state()
check_include_file_cxx("arm_neon.h" HAS_ARM_NEON)
if(HAS_ARM_NEON)
add_compile_definitions(HAS_ARM_NEON)
endif()
endif()
@@ -356,6 +377,15 @@ if(CMAKE_SOURCE_DIR STREQUAL CMAKE_CURRENT_SOURCE_DIR AND BUILD_TESTING)
${symbol_imports})
endif()
if(NOT CMAKE_CROSSCOMPILING)
find_program(HARDENING_CHECK hardening-check)
if(HARDENING_CHECK)
add_test(NAME hardening_check
COMMAND ${HARDENING_CHECK} $<TARGET_FILE:${PROJECT_NAME}>
--nofortify --nostackprotector)
endif()
endif()
# bench
add_executable(conflict_set_bench Bench.cpp)
target_link_libraries(conflict_set_bench PRIVATE ${PROJECT_NAME} nanobench)

View File

@@ -14,6 +14,16 @@ See the License for the specific language governing permissions and
limitations under the License.
*/
#if !defined(USE_SIMD_FALLBACK) && defined(__has_include)
#if defined(__x86_64__) && __has_include("immintrin.h")
#define HAS_AVX 1
#include <immintrin.h>
#elif __has_include("arm_neon.h")
#define HAS_ARM_NEON 1
#include <arm_neon.h>
#endif
#endif
#include "ConflictSet.h"
#include "Internal.h"
#include "LongestCommonPrefix.h"
@@ -34,12 +44,6 @@ limitations under the License.
#include <type_traits>
#include <utility>
#ifdef HAS_AVX
#include <immintrin.h>
#elif defined(HAS_ARM_NEON)
#include <arm_neon.h>
#endif
#ifndef __SANITIZE_THREAD__
#if defined(__has_feature)
#if __has_feature(thread_sanitizer)
@@ -341,8 +345,8 @@ struct Node3 : Node {
// Sorted
uint8_t index[kMaxNodes];
TaggedNodePointer children[kMaxNodes];
InternalVersionT childMaxVersion[kMaxNodes];
TaggedNodePointer children[kMaxNodes];
uint8_t *partialKey() {
assert(!releaseDeferred);
@@ -690,14 +694,17 @@ constexpr int getMaxCapacity(Node *self) {
self->partialKeyLen);
}
#ifdef __APPLE__
// Disabling the free list altogether is faster on my mac m1
constexpr int64_t kMaxFreeListBytes = 0;
#else
constexpr int64_t kMaxFreeListBytes = 1 << 20;
#endif
// Maintains a free list up to kMaxFreeListBytes. If the top element of the list
// doesn't meet the capacity constraints, it's freed and a new node is allocated
// with the minimum capacity. The hope is that "unfit" nodes don't get stuck in
// the free list.
//
// TODO valgrind annotations
template <class T> struct NodeAllocator {
static_assert(std::derived_from<T, Node>);
@@ -727,6 +734,8 @@ template <class T> struct NodeAllocator {
}
void release(T *p) {
assume(p->partialKeyCapacity >= 0);
assume(freeListSize >= 0);
if (freeListSize + sizeof(T) + p->partialKeyCapacity > kMaxFreeListBytes) {
removeNode(p);
return safe_free(p, sizeof(T) + p->partialKeyCapacity);
@@ -734,6 +743,7 @@ template <class T> struct NodeAllocator {
p->parent = freeList;
freeList = p;
freeListSize += sizeof(T) + p->partialKeyCapacity;
VALGRIND_MAKE_MEM_NOACCESS(p, sizeof(T) + p->partialKeyCapacity);
}
void deferRelease(T *p, Node *forwardTo) {
@@ -755,6 +765,13 @@ template <class T> struct NodeAllocator {
void releaseDeferred() {
if (deferredList != nullptr) {
deferredListFront->parent = freeList;
#ifndef NVALGRIND
for (auto *iter = deferredList; iter != freeList;) {
auto *tmp = iter;
iter = (T *)iter->parent;
VALGRIND_MAKE_MEM_NOACCESS(tmp, sizeof(T) + tmp->partialKeyCapacity);
}
#endif
freeList = std::exchange(deferredList, nullptr);
}
for (T *n = std::exchange(deferredListOverflow, nullptr); n != nullptr;) {
@@ -775,6 +792,7 @@ template <class T> struct NodeAllocator {
assert(deferredList == nullptr);
assert(deferredListOverflow == nullptr);
for (T *iter = freeList; iter != nullptr;) {
VALGRIND_MAKE_MEM_DEFINED(iter, sizeof(T));
auto *tmp = iter;
iter = (T *)iter->parent;
removeNode(tmp);
@@ -792,6 +810,7 @@ private:
T *allocate_helper(int minCapacity, int maxCapacity) {
if (freeList != nullptr) {
VALGRIND_MAKE_MEM_DEFINED(freeList, sizeof(T));
freeListSize -= sizeof(T) + freeList->partialKeyCapacity;
assume(freeList->partialKeyCapacity >= 0);
assume(minCapacity >= 0);
@@ -800,6 +819,11 @@ private:
freeList->partialKeyCapacity <= maxCapacity) {
auto *result = freeList;
freeList = (T *)freeList->parent;
VALGRIND_MAKE_MEM_UNDEFINED(result,
sizeof(T) + result->partialKeyCapacity);
VALGRIND_MAKE_MEM_DEFINED(&result->partialKeyCapacity,
sizeof(result->partialKeyCapacity));
VALGRIND_MAKE_MEM_DEFINED(&result->type, sizeof(result->type));
return result;
} else {
auto *p = freeList;
@@ -943,8 +967,7 @@ private:
NodeAllocator<Node256> node256;
};
int getNodeIndex(Node3 *self, uint8_t index) {
Node3 *n = (Node3 *)self;
int getNodeIndex(Node3 *n, uint8_t index) {
assume(n->numChildren >= 1);
assume(n->numChildren <= 3);
for (int i = 0; i < n->numChildren; ++i) {
@@ -1257,33 +1280,32 @@ TaggedNodePointer getChild(Node *self, uint8_t index) {
struct ChildAndMaxVersion {
TaggedNodePointer child;
InternalVersionT maxVersion;
static ChildAndMaxVersion empty() {
ChildAndMaxVersion result;
result.child = nullptr;
return result;
}
};
ChildAndMaxVersion getChildAndMaxVersion(Node0 *, uint8_t) { return {}; }
ChildAndMaxVersion getChildAndMaxVersion(Node3 *self, uint8_t index) {
int i = getNodeIndex(self, index);
if (i < 0) {
ChildAndMaxVersion result;
result.child = nullptr;
return result;
return ChildAndMaxVersion::empty();
}
return {self->children[i], self->childMaxVersion[i]};
}
ChildAndMaxVersion getChildAndMaxVersion(Node16 *self, uint8_t index) {
int i = getNodeIndex(self, index);
if (i < 0) {
ChildAndMaxVersion result;
result.child = nullptr;
return result;
return ChildAndMaxVersion::empty();
}
return {self->children[i], self->childMaxVersion[i]};
}
ChildAndMaxVersion getChildAndMaxVersion(Node48 *self, uint8_t index) {
int i = self->index[index];
if (i < 0) {
ChildAndMaxVersion result;
result.child = nullptr;
return result;
return ChildAndMaxVersion::empty();
}
return {self->children[i], self->childMaxVersion[i]};
}
@@ -1366,17 +1388,11 @@ TaggedNodePointer getChildGeq(Node16 *self, int child) {
TaggedNodePointer getChildGeq(Node48 *self, int child) {
int c = self->bitSet.firstSetGeq(child);
if (c < 0) {
return nullptr;
}
return self->children[self->index[c]];
return c < 0 ? nullptr : self->children[self->index[c]];
}
TaggedNodePointer getChildGeq(Node256 *self, int child) {
int c = self->bitSet.firstSetGeq(child);
if (c < 0) {
return nullptr;
}
return self->children[c];
return c < 0 ? nullptr : self->children[c];
}
TaggedNodePointer getChildGeq(Node *self, int child) {
@@ -1396,22 +1412,25 @@ TaggedNodePointer getChildGeq(Node *self, int child) {
}
}
Node *getFirstChild(Node0 *) { return nullptr; }
Node *getFirstChild(Node3 *self) {
return self->numChildren == 0 ? nullptr : self->children[0];
TaggedNodePointer getFirstChild(Node0 *) { return nullptr; }
TaggedNodePointer getFirstChild(Node3 *self) {
// Improves scan performance
__builtin_prefetch(self->children[1]);
return self->children[0];
}
Node *getFirstChild(Node16 *self) {
return self->numChildren == 0 ? nullptr : self->children[0];
TaggedNodePointer getFirstChild(Node16 *self) {
// Improves scan performance
__builtin_prefetch(self->children[1]);
return self->children[0];
}
Node *getFirstChild(Node48 *self) {
int index = self->index[self->bitSet.firstSetGeq(0)];
return index < 0 ? nullptr : self->children[index];
TaggedNodePointer getFirstChild(Node48 *self) {
return self->children[self->index[self->bitSet.firstSetGeq(0)]];
}
Node *getFirstChild(Node256 *self) {
TaggedNodePointer getFirstChild(Node256 *self) {
return self->children[self->bitSet.firstSetGeq(0)];
}
Node *getFirstChild(Node *self) {
TaggedNodePointer getFirstChild(Node *self) {
// Only require that the node-specific overloads are covered
// GCOVR_EXCL_START
switch (self->getType()) {
@@ -1431,46 +1450,6 @@ Node *getFirstChild(Node *self) {
// GCOVR_EXCL_STOP
}
// Precondition: self has a child
TaggedNodePointer getFirstChildExists(Node3 *self) {
assert(self->numChildren > 0);
return self->children[0];
}
// Precondition: self has a child
TaggedNodePointer getFirstChildExists(Node16 *self) {
assert(self->numChildren > 0);
return self->children[0];
}
// Precondition: self has a child
TaggedNodePointer getFirstChildExists(Node48 *self) {
return self->children[self->index[self->bitSet.firstSetGeq(0)]];
}
// Precondition: self has a child
TaggedNodePointer getFirstChildExists(Node256 *self) {
return self->children[self->bitSet.firstSetGeq(0)];
}
// Precondition: self has a child
TaggedNodePointer getFirstChildExists(Node *self) {
// Only require that the node-specific overloads are covered
// GCOVR_EXCL_START
switch (self->getType()) {
case Type_Node0:
__builtin_unreachable();
case Type_Node3:
return getFirstChildExists(static_cast<Node3 *>(self));
case Type_Node16:
return getFirstChildExists(static_cast<Node16 *>(self));
case Type_Node48:
return getFirstChildExists(static_cast<Node48 *>(self));
case Type_Node256:
return getFirstChildExists(static_cast<Node256 *>(self));
default:
__builtin_unreachable();
}
// GCOVR_EXCL_STOP
}
// self must not be the root
void maybeDecreaseCapacity(Node *&self, WriteContext *writeContext,
ConflictSet::Impl *impl);
@@ -1724,7 +1703,7 @@ TaggedNodePointer &getOrCreateChild(TaggedNodePointer &self, TrivialSpan &key,
}
Node *nextPhysical(Node *node) {
auto nextChild = getFirstChild(node);
Node *nextChild = getFirstChild(node);
if (nextChild != nullptr) {
return nextChild;
}
@@ -1742,7 +1721,7 @@ Node *nextPhysical(Node *node) {
}
Node *nextLogical(Node *node) {
auto nextChild = getFirstChild(node);
Node *nextChild = getFirstChild(node);
if (nextChild != nullptr) {
node = nextChild;
goto downLeftSpine;
@@ -1760,7 +1739,7 @@ Node *nextLogical(Node *node) {
}
}
downLeftSpine:
for (; !node->entryPresent; node = getFirstChildExists(node)) {
for (; !node->entryPresent; node = getFirstChild(node)) {
}
return node;
}
@@ -1930,7 +1909,7 @@ void mergeWithChild(TaggedNodePointer &self, WriteContext *writeContext,
child->parentsIndex = self->parentsIndex;
// Max versions are stored in the parent, so we need to update it now
// that we have a new parent. Safe we call since the root never has a partial
// that we have a new parent. Safe to call since the root never has a partial
// key.
setMaxVersion(child, std::max(childMaxVersion, writeContext->zero));
@@ -2808,7 +2787,7 @@ bool checkRangeStartsWith(NodeT *nTyped, TrivialSpan key, int begin, int end,
__builtin_unreachable(); // GCOVR_EXCL_LINE
downLeftSpine:
for (; !n->entryPresent; n = getFirstChildExists(n)) {
for (; !n->entryPresent; n = getFirstChild(n)) {
}
return n->entry.rangeVersion <= readVersion;
}
@@ -3183,6 +3162,12 @@ Node *firstGeqPhysical(Node *n, const TrivialSpan key) {
#define PRESERVE_NONE
#endif
#if __has_attribute(musttail) && __has_attribute(preserve_none)
constexpr bool kEnableInterleaved = true;
#else
constexpr bool kEnableInterleaved = false;
#endif
namespace check {
typedef PRESERVE_NONE void (*Continuation)(struct Job *, struct Context *);
@@ -3259,7 +3244,7 @@ PRESERVE_NONE void down_left_spine(Job *job, Context *context) {
job->setResult(n->entry.rangeVersion <= job->readVersion);
MUSTTAIL return complete(job, context);
}
auto child = getFirstChildExists(n);
auto child = getFirstChild(n);
job->n = child;
__builtin_prefetch(job->n);
job->continuation = downLeftSpineTable[child.getType()];
@@ -3354,7 +3339,7 @@ template <class NodeT> void iter(Job *job, Context *context) {
job->setResult(n->entry.pointVersion <= job->readVersion);
MUSTTAIL return complete(job, context);
}
auto c = getFirstChildExists(n);
auto c = getFirstChild(n);
job->n = c;
job->continuation = downLeftSpineTable[c.getType()];
__builtin_prefetch(job->n);
@@ -3878,7 +3863,7 @@ void left_side_down_left_spine(Job *job, Context *context) {
}
MUSTTAIL return done_left_side_iter(job, context);
}
auto c = getFirstChildExists(n);
auto c = getFirstChild(n);
job->n = c;
job->continuation = leftSideDownLeftSpineTable[c.getType()];
__builtin_prefetch(job->n);
@@ -4541,7 +4526,7 @@ bool checkPointRead(Node *n, const TrivialSpan key,
if (n->entryPresent) {
return n->entry.pointVersion <= readVersion;
}
n = getFirstChildExists(n);
n = getFirstChild(n);
goto downLeftSpine;
}
@@ -4595,7 +4580,7 @@ bool checkPointRead(Node *n, const TrivialSpan key,
}
}
downLeftSpine:
for (; !n->entryPresent; n = getFirstChildExists(n)) {
for (; !n->entryPresent; n = getFirstChild(n)) {
}
return n->entry.rangeVersion <= readVersion;
}
@@ -4670,7 +4655,7 @@ bool checkPrefixRead(Node *n, const TrivialSpan key,
}
}
downLeftSpine:
for (; !n->entryPresent; n = getFirstChildExists(n)) {
for (; !n->entryPresent; n = getFirstChild(n)) {
}
return n->entry.rangeVersion <= readVersion;
}
@@ -4757,7 +4742,7 @@ bool checkRangeLeftSide(Node *n, TrivialSpan key, int prefixLen,
}
}
downLeftSpine:
for (; !n->entryPresent; n = getFirstChildExists(n)) {
for (; !n->entryPresent; n = getFirstChild(n)) {
}
return n->entry.rangeVersion <= readVersion;
}
@@ -4846,14 +4831,12 @@ backtrack:
searchPathLen -= 1 + n->partialKeyLen;
n = n->parent;
} else {
searchPathLen -= n->partialKeyLen;
n = next;
searchPathLen += n->partialKeyLen;
goto downLeftSpine;
}
}
downLeftSpine:
for (; !n->entryPresent; n = getFirstChildExists(n)) {
for (; !n->entryPresent; n = getFirstChild(n)) {
}
return n->entry.rangeVersion <= readVersion;
}
@@ -4969,51 +4952,50 @@ struct __attribute__((visibility("hidden"))) ConflictSet::Impl {
check::Context context;
context.readContext.impl = this;
#if __has_attribute(musttail)
if (count == 1) {
useSequential(reads, result, count, context);
} else {
constexpr int kConcurrent = 16;
check::Job inProgress[kConcurrent];
context.count = count;
context.oldestVersionFullPrecision = oldestVersionFullPrecision;
context.root = root;
context.queries = reads;
context.results = result;
int64_t started = std::min(kConcurrent, count);
context.started = started;
for (int i = 0; i < started; i++) {
inProgress[i].init(reads + i, result + i, root,
oldestVersionFullPrecision);
}
for (int i = 0; i < started - 1; i++) {
inProgress[i].next = inProgress + i + 1;
}
for (int i = 1; i < started; i++) {
inProgress[i].prev = inProgress + i - 1;
}
inProgress[0].prev = inProgress + started - 1;
inProgress[started - 1].next = inProgress;
if constexpr (kEnableInterleaved) {
if (count == 1) {
useSequential(reads, result, count, context);
} else {
constexpr int kConcurrent = 16;
check::Job inProgress[kConcurrent];
context.count = count;
context.oldestVersionFullPrecision = oldestVersionFullPrecision;
context.root = root;
context.queries = reads;
context.results = result;
int64_t started = std::min(kConcurrent, count);
context.started = started;
for (int i = 0; i < started; i++) {
inProgress[i].init(reads + i, result + i, root,
oldestVersionFullPrecision);
}
for (int i = 0; i < started - 1; i++) {
inProgress[i].next = inProgress + i + 1;
}
for (int i = 1; i < started; i++) {
inProgress[i].prev = inProgress + i - 1;
}
inProgress[0].prev = inProgress + started - 1;
inProgress[started - 1].next = inProgress;
// Kick off the sequence of tail calls that finally returns once all jobs
// are done
inProgress->continuation(inProgress, &context);
// Kick off the sequence of tail calls that finally returns once all
// jobs are done
inProgress->continuation(inProgress, &context);
#ifndef NDEBUG
Arena arena;
auto *results2 = new (arena) Result[count];
check::Context context2;
context2.readContext.impl = this;
useSequential(reads, results2, count, context2);
assert(memcmp(result, results2, count) == 0);
assert(context.readContext == context2.readContext);
Arena arena;
auto *results2 = new (arena) Result[count];
check::Context context2;
context2.readContext.impl = this;
useSequential(reads, results2, count, context2);
assert(memcmp(result, results2, count) == 0);
assert(context.readContext == context2.readContext);
#endif
}
} else {
useSequential(reads, result, count, context);
}
#else
useSequential(reads, result, count, context);
#endif
for (int i = 0; i < count; ++i) {
assert(reads[i].readVersion >= 0);
assert(reads[i].readVersion <= newestVersionFullPrecision);
@@ -5186,11 +5168,6 @@ struct __attribute__((visibility("hidden"))) ConflictSet::Impl {
assert(allPointWrites || sorted);
#endif
#if __has_attribute(musttail)
constexpr bool kEnableInterleaved = true;
#else
constexpr bool kEnableInterleaved = false;
#endif
if (kEnableInterleaved && count > 1) {
interleavedWrites(writes, count, InternalVersionT(writeVersion));
} else {
@@ -5330,6 +5307,11 @@ struct __attribute__((visibility("hidden"))) ConflictSet::Impl {
gc_iterations_total.add(set_oldest_iterations_accum);
if (n == nullptr) {
removalKey = {};
if (removalBufferSize > kMaxRemovalBufferSize) {
safe_free(removalBuffer, removalBufferSize);
removalBufferSize = kMinRemovalBufferSize;
removalBuffer = (uint8_t *)safe_malloc(removalBufferSize);
}
oldestExtantVersion = oldestVersionAtGcBegin;
oldest_extant_version.set(oldestExtantVersion);
oldestVersionAtGcBegin = oldestVersionFullPrecision;
@@ -5340,12 +5322,47 @@ struct __attribute__((visibility("hidden"))) ConflictSet::Impl {
oldestExtantVersion, oldestVersionAtGcBegin);
#endif
} else {
removalKeyArena = Arena();
removalKey = getSearchPath(removalKeyArena, n);
// Store the current search path to resume the scan later
saveRemovalKey(n);
}
return fuel;
}
void saveRemovalKey(Node *n) {
uint8_t *cursor = removalBuffer + removalBufferSize;
int size = 0;
auto reserve = [&](int delta) {
if (size + delta > removalBufferSize) [[unlikely]] {
int newBufSize = std::max(removalBufferSize * 2, size + delta);
uint8_t *newBuf = (uint8_t *)safe_malloc(newBufSize);
memcpy(newBuf + newBufSize - size, cursor, size);
safe_free(removalBuffer, removalBufferSize);
removalBuffer = newBuf;
removalBufferSize = newBufSize;
cursor = newBuf + newBufSize - size;
}
};
for (;;) {
auto partialKey = TrivialSpan{n->partialKey(), n->partialKeyLen};
reserve(partialKey.size());
size += partialKey.size();
cursor -= partialKey.size();
memcpy(cursor, partialKey.data(), partialKey.size());
if (n->parent == nullptr) {
break;
}
reserve(1);
++size;
--cursor;
*cursor = n->parentsIndex;
n = n->parent;
}
removalKey = {cursor, size};
}
void setOldestVersion(int64_t newOldestVersion) {
assert(newOldestVersion >= 0);
assert(newOldestVersion <= newestVersionFullPrecision);
@@ -5396,7 +5413,7 @@ struct __attribute__((visibility("hidden"))) ConflictSet::Impl {
writeContext.~WriteContext();
new (&writeContext) WriteContext();
removalKeyArena = Arena{};
// Leave removalBuffer as is
removalKey = {};
keyUpdates = 10;
@@ -5428,11 +5445,16 @@ struct __attribute__((visibility("hidden"))) ConflictSet::Impl {
~Impl() {
eraseTree(root, &writeContext);
safe_free(metrics, metricsCount * sizeof(metrics[0]));
safe_free(removalBuffer, removalBufferSize);
}
WriteContext writeContext;
Arena removalKeyArena;
static constexpr int kMinRemovalBufferSize = 1 << 10;
// Eventually downsize if larger than this value
static constexpr int kMaxRemovalBufferSize = 1 << 16;
uint8_t *removalBuffer = (uint8_t *)safe_malloc(kMinRemovalBufferSize);
int removalBufferSize = kMinRemovalBufferSize;
TrivialSpan removalKey;
int64_t keyUpdates;
@@ -5587,7 +5609,7 @@ Node *firstGeqLogical(Node *n, const TrivialSpan key) {
if (n->entryPresent) {
return n;
}
n = getFirstChildExists(n);
n = getFirstChild(n);
goto downLeftSpine;
}
@@ -5632,7 +5654,7 @@ Node *firstGeqLogical(Node *n, const TrivialSpan key) {
}
}
downLeftSpine:
for (; !n->entryPresent; n = getFirstChildExists(n)) {
for (; !n->entryPresent; n = getFirstChild(n)) {
}
return n;
}
@@ -5858,13 +5880,13 @@ void checkVersionsGeqOldestExtant(Node *n,
case Type_Node0: {
} break;
case Type_Node3: {
auto *self = static_cast<Node3 *>(n);
[[maybe_unused]] auto *self = static_cast<Node3 *>(n);
for (int i = 0; i < 3; ++i) {
assert(self->childMaxVersion[i] >= oldestExtantVersion);
}
} break;
case Type_Node16: {
auto *self = static_cast<Node16 *>(n);
[[maybe_unused]] auto *self = static_cast<Node16 *>(n);
for (int i = 0; i < 16; ++i) {
assert(self->childMaxVersion[i] >= oldestExtantVersion);
}
@@ -5874,7 +5896,7 @@ void checkVersionsGeqOldestExtant(Node *n,
for (int i = 0; i < 48; ++i) {
assert(self->childMaxVersion[i] >= oldestExtantVersion);
}
for (auto m : self->maxOfMax) {
for ([[maybe_unused]] auto m : self->maxOfMax) {
assert(m >= oldestExtantVersion);
}
} break;
@@ -5883,7 +5905,7 @@ void checkVersionsGeqOldestExtant(Node *n,
for (int i = 0; i < 256; ++i) {
assert(self->childMaxVersion[i] >= oldestExtantVersion);
}
for (auto m : self->maxOfMax) {
for ([[maybe_unused]] auto m : self->maxOfMax) {
assert(m >= oldestExtantVersion);
}
} break;

View File

@@ -13,12 +13,14 @@ RUN TZ=America/Los_Angeles DEBIAN_FRONTEND=noninteractive apt-get install -y \
ccache \
cmake \
curl \
devscripts \
g++-aarch64-linux-gnu \
gcovr \
git \
gnupg \
libc6-dbg \
lsb-release \
mold \
ninja-build \
pre-commit \
python3-requests \

View File

@@ -18,7 +18,6 @@ using namespace weaselab;
#include <span>
#include <string>
#include <thread>
#include <unordered_set>
#include <utility>
#include <callgrind.h>
@@ -368,23 +367,6 @@ template <class T, class C = std::less<T>> auto set(Arena &arena) {
return Set<T, C>(ArenaAlloc<T>(&arena));
}
template <class T> struct MyHash;
template <class T> struct MyHash<T *> {
size_t operator()(const T *t) const noexcept {
size_t result;
memcpy(&result, &t, sizeof(result));
return result;
}
};
template <class T>
using HashSet =
std::unordered_set<T, MyHash<T>, std::equal_to<T>, ArenaAlloc<T>>;
template <class T> auto hashSet(Arena &arena) {
return HashSet<T>(ArenaAlloc<T>(&arena));
}
template <class T, class U>
bool operator==(const ArenaAlloc<T> &lhs, const ArenaAlloc<U> &rhs) {
return lhs.arena == rhs.arena;

4
Jenkinsfile vendored
View File

@@ -91,7 +91,7 @@ pipeline {
minio bucket: 'jenkins', credentialsId: 'jenkins-minio', excludes: '', host: 'minio.weaselab.dev', includes: 'build/*.deb,build/*.rpm,paper/*.pdf', targetFolder: '${JOB_NAME}/${BUILD_NUMBER}/${STAGE_NAME}/'
}
}
stage('Release [gcc]') {
stage('gcc') {
agent {
dockerfile {
args '-v /home/jenkins/ccache:/ccache'
@@ -99,7 +99,7 @@ pipeline {
}
}
steps {
CleanBuildAndTest("-DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ -DCMAKE_CXX_FLAGS=-DNVALGRIND")
CleanBuildAndTest("-DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++")
recordIssues(tools: [gcc()])
}
}

View File

@@ -4,7 +4,14 @@ Intended as an alternative to FoundationDB's skip list.
Hardware for all benchmarks is an AMD Ryzen 9 7900 with (2x32GB) 5600MT/s CL28-34-34-89 1.35V RAM.
Compiler is `Ubuntu clang version 20.0.0 (++20241029082144+7544d3af0e28-1~exp1~20241029082307.506)`.
```
$ clang++ --version
Ubuntu clang version 20.0.0 (++20241119082716+7e85cb8a8a9d-1~exp1~20241119082825.551)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/lib/llvm-20/bin
```
# Microbenchmark
@@ -12,44 +19,45 @@ Compiler is `Ubuntu clang version 20.0.0 (++20241029082144+7544d3af0e28-1~exp1~2
| ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 159.65 | 6,263,576.52 | 1.6% | 2,972.36 | 820.37 | 3.623 | 504.59 | 0.0% | 0.01 | `point reads`
| 156.32 | 6,397,320.65 | 0.7% | 2,913.62 | 806.87 | 3.611 | 490.19 | 0.0% | 0.01 | `prefix reads`
| 229.18 | 4,363,293.65 | 1.2% | 3,541.05 | 1,219.75 | 2.903 | 629.33 | 0.0% | 0.01 | `range reads`
| 363.37 | 2,752,026.30 | 0.3% | 5,273.63 | 1,951.54 | 2.702 | 851.66 | 1.7% | 0.01 | `point writes`
| 364.99 | 2,739,787.02 | 0.3% | 5,250.92 | 1,958.54 | 2.681 | 839.24 | 1.7% | 0.01 | `prefix writes`
| 242.26 | 4,127,796.58 | 2.9% | 3,117.33 | 1,304.41 | 2.390 | 541.07 | 2.8% | 0.02 | `range writes`
| 562.48 | 1,777,855.27 | 0.8% | 7,305.21 | 3,034.34 | 2.408 | 1,329.30 | 1.3% | 0.01 | `monotonic increasing point writes`
| 122,688.57 | 8,150.72 | 0.7% | 798,766.00 | 666,842.00 | 1.198 | 144,584.50 | 0.1% | 0.01 | `worst case for radix tree`
| 41.71 | 23,976,459.34 | 1.7% | 885.00 | 219.17 | 4.038 | 132.00 | 0.0% | 0.01 | `create and destroy`
| 169.16 | 5,911,582.44 | 0.0% | 3,014.03 | 855.12 | 3.525 | 504.59 | 0.0% | 2.02 | `point reads`
| 167.17 | 5,981,796.19 | 0.0% | 2,954.16 | 845.14 | 3.495 | 490.17 | 0.0% | 2.00 | `prefix reads`
| 250.44 | 3,992,954.35 | 0.1% | 3,592.41 | 1,265.18 | 2.839 | 629.31 | 0.0% | 2.99 | `range reads`
| 467.10 | 2,140,846.36 | 0.0% | 4,450.57 | 2,488.36 | 1.789 | 707.92 | 2.1% | 5.62 | `point writes`
| 465.18 | 2,149,723.11 | 0.2% | 4,410.22 | 2,474.92 | 1.782 | 694.74 | 2.1% | 5.55 | `prefix writes`
| 297.45 | 3,361,954.05 | 0.1% | 2,315.38 | 1,581.64 | 1.464 | 396.69 | 3.3% | 3.57 | `range writes`
| 476.56 | 2,098,370.82 | 1.0% | 6,999.33 | 2,492.26 | 2.808 | 1,251.74 | 1.3% | 0.06 | `monotonic increasing point writes`
| 129,455.00 | 7,724.69 | 1.0% | 807,446.67 | 698,559.40 | 1.156 | 144,584.60 | 0.8% | 0.01 | `worst case for radix tree`
| 44.67 | 22,384,996.63 | 0.5% | 902.00 | 235.18 | 3.835 | 132.00 | 0.0% | 0.01 | `create and destroy`
## Radix tree (this implementation)
| ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 12.63 | 79,186,868.18 | 1.4% | 241.61 | 64.76 | 3.731 | 31.64 | 0.8% | 0.01 | `point reads`
| 14.48 | 69,078,073.40 | 0.3% | 292.42 | 74.69 | 3.915 | 41.49 | 0.5% | 0.01 | `prefix reads`
| 34.37 | 29,094,694.11 | 0.2% | 759.53 | 179.77 | 4.225 | 100.38 | 0.2% | 0.01 | `range reads`
| 19.34 | 51,713,896.36 | 0.7% | 369.70 | 101.81 | 3.631 | 47.88 | 0.6% | 0.01 | `point writes`
| 39.16 | 25,538,968.61 | 0.2% | 653.16 | 206.77 | 3.159 | 89.62 | 0.8% | 0.01 | `prefix writes`
| 40.58 | 24,642,681.12 | 4.7% | 718.44 | 216.44 | 3.319 | 99.28 | 0.6% | 0.01 | `range writes`
| 78.77 | 12,694,520.69 | 3.8% | 1,395.55 | 421.73 | 3.309 | 249.81 | 0.1% | 0.01 | `monotonic increasing point writes`
| 287,760.50 | 3,475.11 | 0.5% | 3,929,266.50 | 1,550,225.50 | 2.535 | 639,064.00 | 0.0% | 0.01 | `worst case for radix tree`
| 104.76 | 9,545,250.65 | 3.1% | 2,000.00 | 552.82 | 3.618 | 342.00 | 0.0% | 0.01 | `create and destroy`
| 14.11 | 70,857,435.19 | 0.1% | 247.13 | 71.03 | 3.479 | 32.64 | 0.8% | 0.17 | `point reads`
| 15.63 | 63,997,306.79 | 0.0% | 299.99 | 78.59 | 3.817 | 42.50 | 0.4% | 0.19 | `prefix reads`
| 36.24 | 27,590,266.59 | 0.1% | 782.70 | 182.21 | 4.296 | 106.65 | 0.2% | 0.43 | `range reads`
| 22.72 | 44,004,627.40 | 0.1% | 376.04 | 114.33 | 3.289 | 49.97 | 0.8% | 0.27 | `point writes`
| 40.83 | 24,494,110.04 | 0.0% | 666.07 | 205.35 | 3.244 | 101.33 | 0.3% | 0.49 | `prefix writes`
| 43.45 | 23,016,324.00 | 0.0% | 732.33 | 218.41 | 3.353 | 111.64 | 0.1% | 0.53 | `range writes`
| 81.46 | 12,276,650.63 | 3.6% | 1,458.85 | 411.52 | 3.545 | 280.42 | 0.1% | 0.01 | `monotonic increasing point writes`
| 314,217.00 | 3,182.51 | 1.2% | 4,043,063.50 | 1,593,715.00 | 2.537 | 714,828.00 | 0.1% | 0.01 | `worst case for radix tree`
| 106.79 | 9,364,602.60 | 0.5% | 2,046.00 | 539.75 | 3.791 | 329.00 | 0.0% | 0.01 | `create and destroy`
# "Real data" test
Point queries only, best of three runs. Gc ratio is the ratio of time spent doing garbage collection to time spent adding writes or doing garbage collection. Lower is better.
Point queries only. Gc ratio is the ratio of time spent doing garbage collection to time spent adding writes or doing garbage collection. Lower is better.
## skip list
```
Check: 4.39702 seconds, 370.83 MB/s, Add: 4.50025 seconds, 124.583 MB/s, Gc ratio: 29.1333%, Peak idle memory: 5.51852e+06
Check: 4.62434 seconds, 364.633 MB/s, Add: 3.90399 seconds, 147.371 MB/s, Gc ratio: 33.6898%, Peak idle memory: 5.61007e+06
```
## radix tree
```
Check: 0.987757 seconds, 1650.76 MB/s, Add: 1.24815 seconds, 449.186 MB/s, Gc ratio: 41.4675%, Peak idle memory: 2.02872e+06
Check: 0.956689 seconds, 1762.52 MB/s, Add: 1.35744 seconds, 423.84 MB/s, Gc ratio: 35.0946%, Peak idle memory: 2.32922e+06
```
## hash table
@@ -57,5 +65,6 @@ Check: 0.987757 seconds, 1650.76 MB/s, Add: 1.24815 seconds, 449.186 MB/s, Gc ra
(The hash table implementation doesn't work on range queries, and its purpose is to provide an idea of how fast point queries can be)
```
Check: 0.84256 seconds, 1935.23 MB/s, Add: 0.697204 seconds, 804.146 MB/s, Gc ratio: 35.4091%
Check: 0.799863 seconds, 2108.09 MB/s, Add: 0.667736 seconds, 861.621 MB/s, Gc ratio: 35.0666%, Peak idle memory: 0
```

View File

@@ -5,7 +5,7 @@
#include <cstdio>
#include <cstring>
#include <fcntl.h>
#include <string_view>
#include <span>
#include <sys/mman.h>
#include <sys/stat.h>
#include <unistd.h>
@@ -64,7 +64,7 @@ int main(int argc, const char **argv) {
auto *const mapOriginal = begin;
const auto sizeOriginal = size;
using StringView = std::basic_string_view<uint8_t>;
using StringView = std::span<const uint8_t>;
StringView write;
std::vector<StringView> reads;
@@ -78,9 +78,9 @@ int main(int argc, const char **argv) {
end = (uint8_t *)memchr(begin, '\n', size);
if (line.size() > 0 && line[0] == 'P') {
write = line.substr(2, line.size());
write = line.subspan(2, line.size());
} else if (line.size() > 0 && line[0] == 'L') {
reads.push_back(line.substr(2, line.size()));
reads.push_back(line.subspan(2, line.size()));
} else if (line.empty()) {
{
readRanges.resize(reads.size());
@@ -133,10 +133,10 @@ int main(int argc, const char **argv) {
int metricsCount;
cs.getMetricsV1(&metrics, &metricsCount);
for (int i = 0; i < metricsCount; ++i) {
printf("# HELP %s %s\n", metrics[i].name, metrics[i].help);
printf("# TYPE %s %s\n", metrics[i].name,
metrics[i].type == metrics[i].Counter ? "counter" : "gauge");
printf("%s %g\n", metrics[i].name, metrics[i].getValue());
fprintf(stderr, "# HELP %s %s\n", metrics[i].name, metrics[i].help);
fprintf(stderr, "# TYPE %s %s\n", metrics[i].name,
metrics[i].type == metrics[i].Counter ? "counter" : "gauge");
fprintf(stderr, "%s %g\n", metrics[i].name, metrics[i].getValue());
}
printf("Check: %g seconds, %g MB/s, Add: %g seconds, %g MB/s, Gc ratio: "

View File

@@ -23,6 +23,97 @@
#include "Internal.h"
#include "third_party/nadeau.h"
constexpr int kCacheLine = 64; // TODO mac m1 is 128
template <class T> struct TxQueue {
explicit TxQueue(int lgSlotCount)
: slotCount(1 << lgSlotCount), slotCountMask(slotCount - 1),
slots(new T[slotCount]) {
// Otherwise we can't tell the difference between full and empty.
assert(!(slotCountMask & 0x80000000));
}
/// Call from producer thread, after ensuring consumer is no longer accessing
/// it somehow
~TxQueue() { delete[] slots; }
/// Must be called from the producer thread
void push(T t) {
if (wouldBlock()) {
// Wait for pops to change and try again
consumer.pops.wait(producer.lastPopRead, std::memory_order_relaxed);
producer.lastPopRead = consumer.pops.load(std::memory_order_acquire);
}
slots[producer.pushesNonAtomic++ & slotCountMask] = std::move(t);
// seq_cst so that the notify can't be ordered before the store
producer.pushes.store(producer.pushesNonAtomic, std::memory_order_seq_cst);
// We have to notify every time, since we don't know if this is the last
// push ever
producer.pushes.notify_one();
}
/// Must be called from the producer thread
uint32_t outstanding() {
return producer.pushesNonAtomic -
consumer.pops.load(std::memory_order_relaxed);
}
/// Returns true if a call to push might block. Must be called from the
/// producer thread.
bool wouldBlock() {
// See if we can determine that overflow won't happen entirely from state
// local to the producer
if (producer.pushesNonAtomic - producer.lastPopRead == slotCount - 1) {
// Re-read pops with memory order
producer.lastPopRead = consumer.pops.load(std::memory_order_acquire);
return producer.pushesNonAtomic - producer.lastPopRead == slotCount - 1;
}
return false;
}
/// Valid until the next pop, or until this queue is destroyed.
T *pop() {
// See if we can determine that there's an entry we can pop entirely from
// state local to the consumer
if (consumer.lastPushRead - consumer.popsNonAtomic == 0) {
// Re-read pushes with memory order and try again
consumer.lastPushRead = producer.pushes.load(std::memory_order_acquire);
if (consumer.lastPushRead - consumer.popsNonAtomic == 0) {
// Wait for pushes to change and try again
producer.pushes.wait(consumer.lastPushRead, std::memory_order_relaxed);
consumer.lastPushRead = producer.pushes.load(std::memory_order_acquire);
}
}
auto result = &slots[consumer.popsNonAtomic++ & slotCountMask];
// We only have to write pops with memory order if we've run out of items.
// We know that we'll eventually run out.
if (consumer.lastPushRead - consumer.popsNonAtomic == 0) {
// seq_cst so that the notify can't be ordered before the store
consumer.pops.store(consumer.popsNonAtomic, std::memory_order_seq_cst);
consumer.pops.notify_one();
}
return result;
}
private:
const uint32_t slotCount;
const uint32_t slotCountMask;
T *slots;
struct alignas(kCacheLine) ProducerState {
std::atomic<uint32_t> pushes{0};
uint32_t pushesNonAtomic{0};
uint32_t lastPopRead{0};
};
struct alignas(kCacheLine) ConsumerState {
std::atomic<uint32_t> pops{0};
uint32_t popsNonAtomic{0};
uint32_t lastPushRead{0};
};
ProducerState producer;
ConsumerState consumer;
};
std::atomic<int64_t> transactions;
int64_t safeUnaryMinus(int64_t x) {
@@ -47,13 +138,17 @@ void tupleAppend(std::string &output, int64_t value) {
void tupleAppend(std::string &output, std::string_view value) {
output.push_back('\x02');
for (auto c : value) {
if (c == '\x00') {
output.push_back('\x00');
output.push_back('\xff');
} else {
output.push_back(c);
if (memchr(value.data(), '\x00', value.size()) != nullptr) {
for (auto c : value) {
if (c == '\x00') {
output.push_back('\x00');
output.push_back('\xff');
} else {
output.push_back(c);
}
}
} else {
output.insert(output.end(), value.begin(), value.end());
}
output.push_back('\x00');
}
@@ -64,35 +159,71 @@ template <class... Ts> std::string tupleKey(const Ts &...ts) {
return result;
}
constexpr int kWindowSize = 300000;
constexpr int kTotalKeyRange = 1'000'000'000;
constexpr int kWindowSize = 1'000'000;
constexpr int kNumReadKeysPerTx = 10;
constexpr int kNumWriteKeysPerTx = 5;
void workload(weaselab::ConflictSet *cs) {
int64_t version = kWindowSize;
constexpr int kNumWrites = 16;
for (;; transactions.fetch_add(1, std::memory_order_relaxed)) {
struct Transaction {
std::vector<std::string> keys;
std::vector<weaselab::ConflictSet::ReadRange> reads;
std::vector<weaselab::ConflictSet::WriteRange> writes;
int64_t version;
int64_t oldestVersion;
Transaction() = default;
explicit Transaction(int64_t version)
: version(version), oldestVersion(version - kWindowSize) {
std::vector<int64_t> keyIndices;
for (int i = 0; i < kNumWrites; ++i) {
keyIndices.push_back(rand() % 100'000'000);
for (int i = 0; i < std::max(kNumReadKeysPerTx, kNumWriteKeysPerTx); ++i) {
keyIndices.push_back(rand() % kTotalKeyRange);
}
std::sort(keyIndices.begin(), keyIndices.end());
std::vector<std::string> keys;
std::vector<weaselab::ConflictSet::WriteRange> writes;
constexpr std::string_view suffix = "this is a suffix";
for (int i = 0; i < kNumWrites; ++i) {
keys.push_back(tupleKey(0x100, i, keyIndices[i],
suffix.substr(0, rand() % suffix.size()),
rand()));
constexpr std::string_view fullString =
"this is a string, where a prefix of it is used as an element of the "
"tuple forming the key";
for (int i = 0; i < int(keyIndices.size()); ++i) {
keys.push_back(
tupleKey(0x100, keyIndices[i] / fullString.size(),
fullString.substr(0, keyIndices[i] % fullString.size())));
// printf("%s\n", printable(keys.back()).c_str());
}
for (int i = 0; i < kNumWrites; ++i) {
for (int i = 0; i < kNumWriteKeysPerTx; ++i) {
writes.push_back({{(const uint8_t *)keys[i].data(), int(keys[i].size())},
{nullptr, 0}});
}
cs->addWrites(writes.data(), writes.size(), version);
cs->setOldestVersion(version - kWindowSize);
++version;
reads.push_back({{(const uint8_t *)keys[0].data(), int(keys[0].size())},
{(const uint8_t *)keys[1].data(), int(keys[1].size())},
version - std::min(10, kWindowSize)});
static_assert(kNumReadKeysPerTx >= 3);
for (int i = 2; i < kNumReadKeysPerTx; ++i) {
reads.push_back({{(const uint8_t *)keys[i].data(), int(keys[i].size())},
{nullptr, 0},
version - kWindowSize});
}
}
}
Transaction(Transaction &&) = default;
Transaction &operator=(Transaction &&) = default;
Transaction(Transaction const &) = delete;
Transaction const &operator=(Transaction const &) = delete;
};
struct Resolver {
void resolve(const weaselab::ConflictSet::ReadRange *reads, int readCount,
const weaselab::ConflictSet::WriteRange *writes, int writeCount,
int64_t newVersion, int64_t newOldestVersion) {
results.resize(readCount);
cs.check(reads, results.data(), readCount);
cs.addWrites(writes, writeCount, newVersion);
cs.setOldestVersion(newOldestVersion);
}
ConflictSet cs{0};
private:
std::vector<weaselab::ConflictSet::Result> results;
};
// Adapted from getaddrinfo man page
int getListenFd(const char *node, const char *service) {
@@ -235,7 +366,8 @@ int main(int argc, char **argv) {
{
int listenFd = getListenFd(argv[1], argv[2]);
weaselab::ConflictSet cs{0};
Resolver resolver;
auto &cs = resolver.cs;
weaselab::ConflictSet::MetricsV1 *metrics;
int metricsCount;
cs.getMetricsV1(&metrics, &metricsCount);
@@ -284,7 +416,22 @@ int main(int argc, char **argv) {
}
#endif
auto w = std::thread{workload, &cs};
TxQueue<Transaction> queue{10};
auto workloadThread = std::thread{[&]() {
for (int64_t version = kWindowSize;;
++version, transactions.fetch_add(1, std::memory_order_relaxed)) {
queue.push(Transaction(version));
}
}};
auto resolverThread = std::thread{[&]() {
for (;;) {
auto tx = queue.pop();
resolver.resolve(tx->reads.data(), tx->reads.size(), tx->writes.data(),
tx->writes.size(), tx->version, tx->oldestVersion);
}
}};
for (;;) {
struct sockaddr_storage peer_addr = {};

View File

@@ -767,7 +767,9 @@ struct __attribute__((visibility("hidden"))) ConflictSet::Impl {
false, true);
}
sortPoints(points);
if (!std::is_sorted(points.begin(), points.end())) {
sortPoints(points);
}
int activeWriteCount = 0;
std::vector<std::pair<StringRef, StringRef>> combinedWriteConflictRanges;
@@ -794,7 +796,6 @@ struct __attribute__((visibility("hidden"))) ConflictSet::Impl {
int temp[stripeSize];
int stripes = (stringCount + stripeSize - 1) / stripeSize;
StringRef values[stripeSize];
int64_t writeVersions[stripeSize / 2];
int ss = stringCount - (stripes - 1) * stripeSize;
int64_t entryDelta = 0;
for (int s = stripes - 1; s >= 0; s--) {

View File

@@ -1,3 +1,4 @@
___chkstk_darwin
___stack_chk_fail
___stack_chk_guard
__tlv_bootstrap
@@ -5,6 +6,7 @@ _abort
_bzero
_free
_malloc
_memcmp
_memcpy
_memmove
dyld_stub_binder

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Some files were not shown because too many files have changed in this diff Show More