weaselab/weaseldb

Fork 0

Files

Andrew Noyes f51f257df6 Justify epoll_instances config existing

2025-08-22 16:47:08 -04:00

22 KiB

Raw Blame History

WeaselDB Development Guide

Project Overview
Quick Start
Architecture
Development Guidelines
Common Patterns
Reference

Project Overview

WeaselDB is a high-performance write-side database component designed for systems where reading and writing are decoupled. The system focuses exclusively on handling transactional commits with optimistic concurrency control, while readers are expected to maintain their own queryable representations by subscribing to change streams.

Key Features

Ultra-fast arena allocation (~1ns vs ~20-270ns for malloc)
High-performance JSON parsing with streaming support and SIMD optimization
Multi-threaded networking using multiple epoll instances with unified I/O thread pool
Configurable epoll instances to eliminate kernel-level contention
Zero-copy design throughout the pipeline
Factory pattern safety ensuring correct object lifecycle management

Quick Start

Build System

Use CMake with C++20 and always use ninja (preferred over make):

mkdir -p build && cd build
cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Release -DCMAKE_EXPORT_COMPILE_COMMANDS=ON
ninja

Testing & Development

Run all tests:

ninja test    # or ctest

Individual targets:

./test_arena_allocator - Arena allocator unit tests
./test_commit_request - JSON parsing and validation tests

Benchmarking:

./bench_arena_allocator - Memory allocation performance
./bench_commit_request - JSON parsing performance
./bench_parser_comparison - Compare vs nlohmann::json and RapidJSON

Debug tools:

./debug_arena - Analyze arena allocator behavior

Load Testing:

./load_tester - A tool to generate load against the server for performance and stability analysis.

Dependencies

System requirements:

weaseljson - Must be installed system-wide (high-performance JSON parser)
gperf - System requirement for perfect hash generation

Auto-fetched:

simdutf - SIMD base64 encoding/decoding
toml11 - TOML configuration parsing
doctest - Testing framework
nanobench - Benchmarking library

Architecture

Core Components

Arena Allocator (`src/arena_allocator.hpp`)

Ultra-fast memory allocator optimized for request/response patterns:

~1ns allocation time vs ~20-270ns for malloc
Lazy initialization with geometric block growth (doubling strategy)
Intrusive linked list design for minimal memory overhead
Memory-efficient reset that keeps the first block and frees others
STL-compatible interface via ArenaStlAllocator
O(1) amortized allocation with proper alignment handling
Move semantics for efficient transfers
Thread-safe per-connection usage via exclusive ownership model

Networking Layer

Server (src/server.{hpp,cpp}):

High-performance multi-threaded networking using multiple epoll instances with unified I/O thread pool
Configurable epoll instances to eliminate kernel-level epoll_ctl contention (default: 2, max: io_threads)
Round-robin thread-to-epoll assignment distributes I/O threads across epoll instances
Connection distribution keeps accepted connections on same epoll, returns via round-robin
Factory pattern construction via Server::create() ensures proper shared_ptr semantics
Safe shutdown mechanism with async-signal-safe shutdown() method
Connection ownership management with automatic cleanup on server destruction
Pluggable protocol handlers via ConnectionHandler interface
EPOLL_EXCLUSIVE on listen socket across all epoll instances prevents thundering herd

Connection (src/connection.{hpp,cpp}):

Efficient per-connection state management with arena-based memory allocation
Safe ownership transfer between server threads and protocol handlers
Automatic cleanup on connection closure or server shutdown
Handler interface isolation - only exposes necessary methods to protocol handlers
Protocol-specific data: user_data void* for custom handler data

ConnectionHandler Interface (src/connection_handler.hpp):

Abstract protocol interface decoupling networking from application logic
Ownership transfer support allowing handlers to take connections for async processing
Streaming data processing with partial message handling
Connection lifecycle hooks for initialization and cleanup

Thread Pipeline (`src/ThreadPipeline.h`)

A high-performance, multi-stage, lock-free pipeline for inter-thread communication.

Lock-Free Design: Uses a shared ring buffer with atomic counters for coordination, avoiding locks for maximum throughput.
Multi-Stage Processing: Allows items (like connections or data packets) to flow through a series of processing stages (e.g., from I/O threads to worker threads).
Batching Support: Enables efficient batch processing of items to reduce overhead.
RAII Guards: Utilizes RAII (StageGuard, ProducerGuard) to ensure thread-safe publishing and consumption of items in the pipeline, even in the presence of exceptions.

Parsing Layer

JSON Commit Request Parser (src/json_commit_request_parser.{hpp,cpp}):

High-performance JSON parser using weaseljson library
Streaming parser support for incremental parsing of network data
gperf-optimized token recognition for fast JSON key parsing
Base64 decoding using SIMD-accelerated simdutf
Comprehensive validation of transaction structure
Perfect hash table lookup for JSON keys using gperf
Zero hash collisions for known JSON tokens eliminates branching

Parser Interface (src/commit_request_parser.hpp):

Abstract base class for commit request parsers
Format-agnostic parsing interface supporting multiple serialization formats
Streaming and one-shot parsing modes
Standardized error handling across parser implementations

Data Model

Commit Request Data Model (src/commit_request.hpp):

Format-agnostic data structure for representing transactional commits
Arena-backed string storage with efficient memory management
Move-only semantics for optimal performance
Builder pattern for constructing commit requests
Zero-copy string views pointing to arena-allocated memory

Configuration & Optimization

Configuration System (src/config.{hpp,cpp}):

TOML-based configuration using toml11 library
Structured configuration with server, commit, and subscription sections
Default fallback values for all configuration options
Type-safe parsing with validation and bounds checking
See config.md for complete configuration documentation

JSON Token Optimization (src/json_tokens.gperf, src/json_token_enum.hpp):

Perfect hash table generated by gperf for O(1) JSON key lookup
Compile-time token enumeration for type-safe key identification
Minimal perfect hash reduces memory overhead and improves cache locality
Build-time code generation ensures optimal performance

Transaction Data Model

CommitRequest Structure

CommitRequest {
  - request_id: Optional unique identifier
  - leader_id: Expected leader for consistency
  - read_version: Snapshot version for preconditions
  - preconditions[]: Optimistic concurrency checks
    - point_read: Single key existence/content validation
    - range_read: Range-based consistency validation
  - operations[]: Ordered mutation operations
    - write: Set key-value pair
    - delete: Remove single key
    - range_delete: Remove key range
}

Memory Management Model

Connection Ownership Lifecycle

Creation: Accept threads create connections, transfer to epoll as raw pointers
Processing: Network threads claim ownership by wrapping in unique_ptr
Handler Transfer: Handlers can take ownership for async processing via unique_ptr.release()
Return Path: Handlers use Server::releaseBackToServer() to return connections
Safety: All transfers use weak_ptr to server for safe cleanup
Cleanup: RAII ensures proper resource cleanup in all scenarios

Arena Memory Lifecycle

Request Processing: Handler uses conn->getArena() to allocate memory for parsing request data
Response Generation: Handler uses arena for temporary response construction (headers, JSON, etc.)
Response Queuing: Handler calls conn->appendMessage() which copies data to arena-backed message queue
Response Writing: Server writes all queued messages to socket via writeBytes()

Note

: Call conn->reset() periodically to reclaim arena memory. Best practice is after all outgoing bytes have been written.

Threading Model and EPOLLONESHOT

EPOLLONESHOT Design Rationale: WeaselDB uses EPOLLONESHOT for all connection file descriptors to enable safe multi-threaded ownership transfer without complex synchronization:

Key Benefits:

Automatic fd disarming - When epoll triggers an event, the fd is automatically removed from epoll monitoring
Race-free ownership transfer - Handlers can safely take connection ownership and move to other threads
Zero-coordination async processing - No manual synchronization needed between network threads and handler threads

Threading Flow:

Event Trigger: Network thread gets epoll event → connection auto-disarmed via ONESHOT
Safe Transfer: Handler can take ownership (std::move(conn_ptr)) with no epoll interference
Async Processing: Connection processed on handler thread while epoll cannot trigger spurious events
Return & Re-arm: Server::receiveConnectionBack() re-arms fd with epoll_ctl(EPOLL_CTL_MOD)

Performance Trade-off:

Cost: One epoll_ctl(MOD) syscall per connection return (~100-200ns)
Benefit: Eliminates complex thread synchronization and prevents race conditions
Alternative cost: Manual EPOLL_CTL_DEL/ADD + locking would be significantly higher

Without EPOLLONESHOT risks:

Multiple threads processing same fd simultaneously
Use-after-move when network thread accesses transferred connection
Complex synchronization between epoll events and ownership transfers

This design enables the async handler pattern where connections can be safely moved between threads for background processing while maintaining high performance and thread safety.

API Endpoints

The system implements a RESTful API:

GET /v1/version - Retrieve current committed version and leader
POST /v1/commit - Submit transactional operations
GET /v1/subscribe - Stream committed transactions
GET /v1/status - Check commit status by request_id
PUT /v1/retention/<policy_id> - Creates or updates a retention policy
GET /v1/retention/<policy_id> - Retrieves a retention policy by ID
GET /v1/retention/ - Retrieves all retention policies
DELETE /v1/retention/<policy_id> - Removes a retention policy
GET /metrics - Retrieves server metrics for monitoring

Design Principles

Performance-first - Every component optimized for high throughput
Scalable concurrency - Multiple epoll instances eliminate kernel contention
Memory efficiency - Arena allocation eliminates fragmentation
Zero-copy - Minimize data copying throughout pipeline
Streaming-ready - Support incremental processing
Type safety - Compile-time validation where possible
Resource management - RAII and move semantics throughout

Future Integration Points

This write-side component is designed to integrate with:

Leader election systems for distributed consensus
Replication mechanisms for fault tolerance
Read-side systems that consume the transaction stream
Monitoring systems for operational visibility

Development Guidelines

Code Style & Conventions

C++ Style: Modern C++20 with RAII and move semantics
Memory Management: Prefer arena allocation over standard allocators
String Handling: Use std::string_view for zero-copy operations
Error Handling: Return error codes or use exceptions appropriately
Naming: snake_case for variables/functions, PascalCase for classes
Performance: Always consider allocation patterns and cache locality

Critical Implementation Rules

Server Creation: Always use Server::create() factory method - direct construction is impossible
Connection Creation: Only the Server can create connections - no public constructor or factory method
Connection Ownership: Use unique_ptr semantics for safe ownership transfer between components
Arena Allocator Pattern: Always use ArenaAllocator for temporary allocations within request processing
String View Usage: Prefer std::string_view over std::string when pointing to arena-allocated memory
Ownership Transfer: Use Server::releaseBackToServer() for returning connections to server from handlers
JSON Token Lookup: Use the gperf-generated perfect hash table in json_tokens.hpp for O(1) key recognition
Base64 Handling: Always use simdutf for base64 encoding/decoding for performance
Thread Safety: Connection ownership transfers are designed to be thread-safe with proper RAII cleanup

Project Structure

src/ - Core headers and implementation files
tests/ - doctest-based unit tests
benchmarks/ - nanobench performance tests
tools/ - Debugging and analysis utilities
build/ - CMake-generated files including json_tokens.cpp

Extension Points

Adding New Protocol Handlers

Inherit from ConnectionHandler in src/connection_handler.hpp
Implement on_data_arrived() with proper ownership semantics
Use connection's arena allocator for temporary allocations: conn->getArena()
Handle partial messages and streaming protocols appropriately
Use Server::releaseBackToServer() if taking ownership for async processing
Add corresponding test cases and integration tests
Consider performance implications of ownership transfers

Adding New Parsers

Inherit from CommitRequestParser in src/commit_request_parser.hpp
Implement both streaming and one-shot parsing modes
Use arena allocation for all temporary string storage
Add corresponding test cases in tests/
Add benchmark comparisons in benchmarks/

Performance Guidelines

Memory: Arena allocation eliminates fragmentation - use it for all request-scoped data
CPU: Perfect hashing and SIMD operations are critical paths - avoid alternatives
I/O: Streaming parser design supports incremental network data processing
Cache: String views avoid copying, keeping data cache-friendly

Configuration & Testing

Configuration: All configuration is TOML-based using config.toml (see config.md)
Testing Strategy: Run unit tests, benchmarks, and debug tools before submitting changes
Build System: CMake generates gperf hash tables at build time; always use ninja
Test Synchronization:
- ABSOLUTELY NEVER use sleep(), std::this_thread::sleep_for(), or any timeout-based waiting in tests
- NEVER use condition_variable.wait_for() or other timeout variants
- Use deterministic synchronization only:
  - Blocking I/O (blocking read/write calls that naturally wait)
  - condition_variable.wait() with no timeout (waits indefinitely until condition is met)
  - std::latch, std::barrier, futures/promises for coordination
  - RAII guards and resource management for cleanup
- Tests should either pass (correct) or hang forever (indicates real bug to investigate)
- No timeouts, no flaky behavior, no false positives/negatives

Common Patterns

Factory Method Patterns

Server Creation

// Server must be created via factory method
auto server = Server::create(config, handler);

// Never create on stack or with make_shared (won't compile):
// Server server(config, handler);  // Compiler error - constructor private
// auto server = std::make_shared<Server>(config, handler);  // Compiler error

Connection Creation (Server-Only)

// Only Server can create connections (using private friend method)
class Server {
private:
  auto conn = Connection::createForServer(addr, fd, id, handler, weak_from_this());
};

// No public way to create connections - all these fail:
// auto conn = Connection::create(...);        // ERROR: no such method
// Connection conn(addr, fd, id, handler, server);  // ERROR: private constructor
// auto conn = std::make_unique<Connection>(...);   // ERROR: private constructor

ConnectionHandler Implementation Patterns

Simple Synchronous Handler

class HttpHandler : public ConnectionHandler {
public:
  void on_data_arrived(std::string_view data, std::unique_ptr<Connection>& conn_ptr) override {
    // Parse HTTP request using connection's arena
    ArenaAllocator& arena = conn_ptr->getArena();

    // Generate response
    conn_ptr->appendMessage("HTTP/1.1 200 OK\r\n\r\nHello World");

    // Server retains ownership
  }
};

Async Handler with Ownership Transfer

class AsyncHandler : public ConnectionHandler {
public:
  void on_data_arrived(std::string_view data, std::unique_ptr<Connection>& conn_ptr) override {
    // Take ownership for async processing
    auto connection = std::move(conn_ptr); // conn_ptr is now null

    work_queue.push([connection = std::move(connection)](std::string_view data) mutable {
      // Process asynchronously
      connection->appendMessage("Async response");

      // Return ownership to server when done
      Server::releaseBackToServer(std::move(connection));
    });
  }
};

Batching Handler with User Data

class BatchingHandler : public ConnectionHandler {
public:
  void on_connection_established(Connection &conn) override {
    // Allocate some protocol-specific data and attach it to the connection
    conn.user_data = new MyProtocolData();
  }

  void on_connection_closed(Connection &conn) override {
    // Free the protocol-specific data
    delete static_cast<MyProtocolData*>(conn.user_data);
  }

  void on_data_arrived(std::string_view data,
                       std::unique_ptr<Connection> &conn_ptr) override {
    // Process data and maybe store some results in the user_data
    auto* proto_data = static_cast<MyProtocolData*>(conn_ptr->user_data);
    proto_data->process(data);
  }

  void on_post_batch(std::span<std::unique_ptr<Connection>> batch) override {
    // Process a batch of connections
    for (auto& conn_ptr : batch) {
      if (conn_ptr) {
        auto* proto_data = static_cast<MyProtocolData*>(conn_ptr->user_data);
        if (proto_data->is_ready()) {
          // This connection is ready for the next stage, move it to the pipeline
          pipeline_.push(std::move(conn_ptr));
        }
      }
    }
  }

private:
  MyProcessingPipeline pipeline_;
};

Streaming "yes" Handler

class YesHandler : public ConnectionHandler {
public:
  void on_connection_established(Connection &conn) override {
    // Write an initial "y\n"
    conn.appendMessage("y\n");
  }

  void on_write_progress(std::unique_ptr<Connection> &conn) override {
    if (conn->outgoingBytesQueued() == 0) {
      // Don't use an unbounded amount of memory
      conn->reset();
      // Write "y\n" repeatedly
      conn->appendMessage("y\n");
    }
  }
};

Memory Management Patterns

Arena-Based String Handling

// Preferred: Zero-copy string view with arena allocation
std::string_view process_json_key(const char* data, ArenaAllocator& arena);

// Avoid: Unnecessary string copies
std::string process_json_key(const char* data);

Safe Connection Ownership Transfer

// In handler - take ownership for background processing
Connection* raw_conn = conn_ptr.release();

// Process on worker thread
background_processor.submit([raw_conn]() {
  // Do work...
  raw_conn->appendMessage("Background result");

  // Return to server safely (handles server destruction)
  Server::releaseBackToServer(std::unique_ptr<Connection>(raw_conn));
});

Data Construction Patterns

Builder Pattern Usage

CommitRequest request = CommitRequestBuilder(arena)
    .request_id("example-id")
    .leader_id("leader-123")
    .read_version(42)
    .build();

Error Handling Pattern

enum class ParseResult { Success, InvalidJson, MissingField };
ParseResult parse_commit_request(const char* json, CommitRequest& out);

Reference

Build Targets

Test Executables:

test_arena_allocator - Arena allocator functionality tests
test_commit_request - JSON parsing and validation tests
Main server executable (compiled from src/main.cpp)

Benchmark Executables:

bench_arena_allocator - Arena allocator performance benchmarks
bench_commit_request - JSON parsing performance benchmarks
bench_parser_comparison - Comparison benchmarks vs nlohmann::json and RapidJSON

Debug Tools:

debug_arena - Debug tool for arena allocator analysis

Performance Characteristics

Memory Allocation:

~1ns allocation time vs standard allocators
Bulk deallocation eliminates individual free() calls
Optimized geometric growth uses current block size for doubling strategy
Alignment-aware allocation prevents performance penalties

JSON Parsing:

Streaming parser handles large payloads efficiently
Incremental processing suitable for network protocols
Arena storage eliminates string allocation overhead
SIMD-accelerated base64 decoding using simdutf for maximum performance
Perfect hash table provides O(1) JSON key lookup via gperf
Zero hash collisions for known JSON tokens eliminates branching

Build Notes

Always use ninja - faster and more reliable than make
Build from project root: mkdir -p build && cd build && cmake .. -G Ninja && ninja
For specific targets: ninja <target_name> (e.g., ninja load_tester)
Always build with -DCMAKE_EXPORT_COMPILE_COMMANDS=ON

22 KiB Raw Blame History