weaseldb/design.md

# WeaselDB Development Guide

## Table of Contents

1. [Project Overview](#project-overview)
2. [Quick Start](#quick-start)
3. [Architecture](#architecture)
4. [Development Guidelines](#development-guidelines)
5. [Common Patterns](#common-patterns)
6. [Reference](#reference)

**See also:** [style.md](style.md) for comprehensive C++ coding standards and conventions.

---

## Project Overview

WeaselDB is a high-performance write-side database component designed for systems where reading and writing are decoupled. The system focuses exclusively on handling transactional commits with optimistic concurrency control, while readers are expected to maintain their own queryable representations by subscribing to change streams.

### Key Features

- **Ultra-fast arena allocation** (~1ns vs ~20-270ns for malloc)
- **High-performance JSON parsing** with streaming support and SIMD optimization
- **Multi-threaded networking** using multiple epoll instances with unified I/O thread pool
- **Configurable epoll instances** to eliminate kernel-level contention
- **Zero-copy design** throughout the pipeline
- **Factory pattern safety** ensuring correct object lifecycle management

---

## Quick Start

### Build System

Use CMake with C++20 and **always use ninja** (preferred over make):

```bash
mkdir -p build && cd build
cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Release -DCMAKE_EXPORT_COMPILE_COMMANDS=ON
ninja
```

### Testing & Development

**Run all tests:**
```bash
ninja test    # or ctest
```

**Individual targets:**
- `./test_arena_allocator` - Arena allocator unit tests
- `./test_commit_request` - JSON parsing and validation tests

**Benchmarking:**
- `./bench_arena_allocator` - Memory allocation performance
- `./bench_commit_request` - JSON parsing performance
- `./bench_parser_comparison` - Compare vs nlohmann::json and RapidJSON

**Debug tools:**
- `./debug_arena` - Analyze arena allocator behavior

**Load Testing:**
- `./load_tester` - A tool to generate load against the server for performance and stability analysis.

### Dependencies

**System requirements:**
- **weaseljson** - Must be installed system-wide (high-performance JSON parser)
- **gperf** - System requirement for perfect hash generation

**Auto-fetched:**
- **simdutf** - SIMD base64 encoding/decoding
- **toml11** - TOML configuration parsing
- **doctest** - Testing framework
- **nanobench** - Benchmarking library

---

## Architecture

### Core Components

#### **Arena Allocator** (`src/arena_allocator.hpp`)

Ultra-fast memory allocator optimized for request/response patterns:

- **~1ns allocation time** vs ~20-270ns for malloc
- **Lazy initialization** with geometric block growth (doubling strategy)
- **Intrusive linked list design** for minimal memory overhead
- **Memory-efficient reset** that keeps the first block and frees others
- **STL-compatible interface** via `ArenaStlAllocator`
- **O(1) amortized allocation** with proper alignment handling
- **Move semantics** for efficient transfers
- **Thread-safe per-connection** usage via exclusive ownership model

#### **Networking Layer**

**Server** (`src/server.{hpp,cpp}`):
- **High-performance multi-threaded networking** using multiple epoll instances with unified I/O thread pool
- **Configurable epoll instances** to eliminate kernel-level epoll_ctl contention (default: 2, max: io_threads)
- **Round-robin thread-to-epoll assignment** distributes I/O threads across epoll instances
- **Connection distribution** keeps accepted connections on same epoll, returns via round-robin
- **Factory pattern construction** via `Server::create()` ensures proper shared_ptr semantics
- **Safe shutdown mechanism** with async-signal-safe shutdown() method
- **Connection ownership management** with automatic cleanup on server destruction
- **Pluggable protocol handlers** via ConnectionHandler interface
- **EPOLL_EXCLUSIVE** on listen socket across all epoll instances prevents thundering herd

**Connection** (`src/connection.{hpp,cpp}`):
- **Efficient per-connection state management** with arena-based memory allocation
- **Safe ownership transfer** between server threads and protocol handlers
- **Automatic cleanup** on connection closure or server shutdown
- **Handler interface isolation** - only exposes necessary methods to protocol handlers
- **Protocol-specific data:** `user_data` `void*` for custom handler data

**ConnectionHandler Interface** (`src/connection_handler.hpp`):
- **Abstract protocol interface** decoupling networking from application logic
- **Ownership transfer support** allowing handlers to take connections for async processing
- **Streaming data processing** with partial message handling
- **Connection lifecycle hooks** for initialization and cleanup

#### **Thread Pipeline** (`src/ThreadPipeline.h`)

A high-performance, multi-stage, lock-free pipeline for inter-thread communication.

- **Lock-Free Design**: Uses a shared ring buffer with atomic counters for coordination, avoiding locks for maximum throughput.
- **Multi-Stage Processing**: Allows items (like connections or data packets) to flow through a series of processing stages (e.g., from I/O threads to worker threads).
- **Batching Support**: Enables efficient batch processing of items to reduce overhead.
- **RAII Guards**: Utilizes RAII (`StageGuard`, `ProducerGuard`) to ensure thread-safe publishing and consumption of items in the pipeline, even in the presence of exceptions.

#### **Parsing Layer**

**JSON Commit Request Parser** (`src/json_commit_request_parser.{hpp,cpp}`):
- **High-performance JSON parser** using `weaseljson` library
- **Streaming parser support** for incremental parsing of network data
- **gperf-optimized token recognition** for fast JSON key parsing
- **Base64 decoding** using SIMD-accelerated simdutf
- **Comprehensive validation** of transaction structure
- **Perfect hash table lookup** for JSON keys using gperf
- **Zero hash collisions** for known JSON tokens eliminates branching

**Parser Interface** (`src/commit_request_parser.hpp`):
- **Abstract base class** for commit request parsers
- **Format-agnostic parsing interface** supporting multiple serialization formats
- **Streaming and one-shot parsing modes**
- **Standardized error handling** across parser implementations

#### **Data Model**

**Commit Request Data Model** (`src/commit_request.hpp`):
- **Format-agnostic data structure** for representing transactional commits
- **Arena-backed string storage** with efficient memory management
- **Move-only semantics** for optimal performance
- **Builder pattern** for constructing commit requests
- **Zero-copy string views** pointing to arena-allocated memory

#### **Configuration & Optimization**

**Configuration System** (`src/config.{hpp,cpp}`):
- **TOML-based configuration** using `toml11` library
- **Structured configuration** with server, commit, and subscription sections
- **Default fallback values** for all configuration options
- **Type-safe parsing** with validation and bounds checking
- See `config.md` for complete configuration documentation

**JSON Token Optimization** (`src/json_tokens.gperf`, `src/json_token_enum.hpp`):
- **Perfect hash table** generated by gperf for O(1) JSON key lookup
- **Compile-time token enumeration** for type-safe key identification
- **Minimal perfect hash** reduces memory overhead and improves cache locality
- **Build-time code generation** ensures optimal performance

### Transaction Data Model

#### CommitRequest Structure
```
CommitRequest {
  - request_id: Optional unique identifier
  - leader_id: Expected leader for consistency
  - read_version: Snapshot version for preconditions
  - preconditions[]: Optimistic concurrency checks
    - point_read: Single key existence/content validation
    - range_read: Range-based consistency validation
  - operations[]: Ordered mutation operations
    - write: Set key-value pair
    - delete: Remove single key
    - range_delete: Remove key range
}
```

### Memory Management Model

#### Connection Ownership Lifecycle
1. **Creation**: Accept threads create connections, transfer to epoll as raw pointers
2. **Processing**: Network threads claim ownership by wrapping in unique_ptr
3. **Handler Transfer**: Handlers can take ownership for async processing via unique_ptr.release()
4. **Return Path**: Handlers use Server::releaseBackToServer() to return connections
5. **Safety**: All transfers use weak_ptr to server for safe cleanup
6. **Cleanup**: RAII ensures proper resource cleanup in all scenarios

#### Arena Memory Lifecycle
1. **Request Processing**: Handler uses `conn->getArena()` to allocate memory for parsing request data
2. **Response Generation**: Handler uses arena for temporary response construction (headers, JSON, etc.)
3. **Response Queuing**: Handler calls `conn->appendMessage()` which copies data to arena-backed message queue
4. **Response Writing**: Server writes all queued messages to socket via `writeBytes()`

> **Note**: Call `conn->reset()` periodically to reclaim arena memory. Best practice is after all outgoing bytes have been written.

#### Threading Model and EPOLLONESHOT

**EPOLLONESHOT Design Rationale:**
WeaselDB uses `EPOLLONESHOT` for all connection file descriptors to enable safe multi-threaded ownership transfer without complex synchronization:

**Key Benefits:**
1. **Automatic fd disarming** - When epoll triggers an event, the fd is automatically removed from epoll monitoring
2. **Race-free ownership transfer** - Handlers can safely take connection ownership and move to other threads
3. **Zero-coordination async processing** - No manual synchronization needed between network threads and handler threads

**Threading Flow:**
1. **Event Trigger**: Network thread gets epoll event → connection auto-disarmed via ONESHOT
2. **Safe Transfer**: Handler can take ownership (`std::move(conn_ptr)`) with no epoll interference
3. **Async Processing**: Connection processed on handler thread while epoll cannot trigger spurious events
4. **Return & Re-arm**: `Server::receiveConnectionBack()` re-arms fd with `epoll_ctl(EPOLL_CTL_MOD)`

**Performance Trade-off:**
- **Cost**: One `epoll_ctl(MOD)` syscall per connection return (~100-200ns)
- **Benefit**: Eliminates complex thread synchronization and prevents race conditions
- **Alternative cost**: Manual `EPOLL_CTL_DEL`/`ADD` + locking would be significantly higher

**Without EPOLLONESHOT risks:**
- Multiple threads processing same fd simultaneously
- Use-after-move when network thread accesses transferred connection
- Complex synchronization between epoll events and ownership transfers

This design enables the async handler pattern where connections can be safely moved between threads for background processing while maintaining high performance and thread safety.

### API Endpoints

The system implements a RESTful API:

- **GET /v1/version** - Retrieve current committed version and leader
- **POST /v1/commit** - Submit transactional operations
- **GET /v1/subscribe** - Stream committed transactions
- **GET /v1/status** - Check commit status by request_id
- **PUT /v1/retention/<policy_id>** - Creates or updates a retention policy
- **GET /v1/retention/<policy_id>** - Retrieves a retention policy by ID
- **GET /v1/retention/** - Retrieves all retention policies
- **DELETE /v1/retention/<policy_id>** - Removes a retention policy
- **GET /metrics** - Retrieves server metrics for monitoring

### Design Principles

1. **Performance-first** - Every component optimized for high throughput
2. **Scalable concurrency** - Multiple epoll instances eliminate kernel contention
3. **Memory efficiency** - Arena allocation eliminates fragmentation
4. **Zero-copy** - Minimize data copying throughout pipeline
5. **Streaming-ready** - Support incremental processing
6. **Type safety** - Compile-time validation where possible
7. **Resource management** - RAII and move semantics throughout

### Future Integration Points

This write-side component is designed to integrate with:
- **Leader election** systems for distributed consensus
- **Replication** mechanisms for fault tolerance
- **Read-side systems** that consume the transaction stream
- **Monitoring** systems for operational visibility

---

## Development Guidelines

### Code Style & Conventions

- **C++ Style**: Modern C++20 with RAII and move semantics
- **Memory Management**: Prefer arena allocation over standard allocators
- **String Handling**: Use `std::string_view` for zero-copy operations
- **Error Handling**: Return error codes or use exceptions appropriately
- **Naming**: snake_case for variables/functions, PascalCase for classes
- **Performance**: Always consider allocation patterns and cache locality

### Critical Implementation Rules

- **Server Creation**: Always use `Server::create()` factory method - direct construction is impossible
- **Connection Creation**: Only the Server can create connections - no public constructor or factory method
- **Connection Ownership**: Use unique_ptr semantics for safe ownership transfer between components
- **Arena Allocator Pattern**: Always use `ArenaAllocator` for temporary allocations within request processing
- **String View Usage**: Prefer `std::string_view` over `std::string` when pointing to arena-allocated memory
- **Ownership Transfer**: Use `Server::releaseBackToServer()` for returning connections to server from handlers
- **JSON Token Lookup**: Use the gperf-generated perfect hash table in `json_tokens.hpp` for O(1) key recognition
- **Base64 Handling**: Always use simdutf for base64 encoding/decoding for performance
- **Thread Safety**: Connection ownership transfers are designed to be thread-safe with proper RAII cleanup

### Project Structure

- **`src/`** - Core headers and implementation files
- **`tests/`** - doctest-based unit tests
- **`benchmarks/`** - nanobench performance tests
- **`tools/`** - Debugging and analysis utilities
- **`build/`** - CMake-generated files including `json_tokens.cpp`

### Extension Points

#### Adding New Protocol Handlers
1. Inherit from `ConnectionHandler` in `src/connection_handler.hpp`
2. Implement `on_data_arrived()` with proper ownership semantics
3. Use connection's arena allocator for temporary allocations: `conn->getArena()`
4. Handle partial messages and streaming protocols appropriately
5. Use `Server::releaseBackToServer()` if taking ownership for async processing
6. Add corresponding test cases and integration tests
7. Consider performance implications of ownership transfers

#### Adding New Parsers
1. Inherit from `CommitRequestParser` in `src/commit_request_parser.hpp`
2. Implement both streaming and one-shot parsing modes
3. Use arena allocation for all temporary string storage
4. Add corresponding test cases in `tests/`
5. Add benchmark comparisons in `benchmarks/`

### Performance Guidelines

- **Memory**: Arena allocation eliminates fragmentation - use it for all request-scoped data
- **CPU**: Perfect hashing and SIMD operations are critical paths - avoid alternatives
- **I/O**: Streaming parser design supports incremental network data processing
- **Cache**: String views avoid copying, keeping data cache-friendly

### Configuration & Testing

- **Configuration**: All configuration is TOML-based using `config.toml` (see `config.md`)
- **Testing Strategy**: Run unit tests, benchmarks, and debug tools before submitting changes
- **Build System**: CMake generates gperf hash tables at build time; always use ninja
- **Test Synchronization**:
  - **ABSOLUTELY NEVER use sleep(), std::this_thread::sleep_for(), or any timeout-based waiting in tests**
  - **NEVER use condition_variable.wait_for() or other timeout variants**
  - Use deterministic synchronization only:
    - **Blocking I/O** (blocking read/write calls that naturally wait)
    - **condition_variable.wait()** with no timeout (waits indefinitely until condition is met)
    - **std::latch, std::barrier, futures/promises** for coordination
    - **RAII guards and resource management** for cleanup
  - Tests should either pass (correct) or hang forever (indicates real bug to investigate)
  - No timeouts, no flaky behavior, no false positives/negatives

---

## Common Patterns

### Factory Method Patterns

#### Server Creation
```cpp
// Server must be created via factory method
auto server = Server::create(config, handler);

// Never create on stack or with make_shared (won't compile):
// Server server(config, handler);  // Compiler error - constructor private
// auto server = std::make_shared<Server>(config, handler);  // Compiler error
```

#### Connection Creation (Server-Only)
```cpp
// Only Server can create connections (using private friend method)
class Server {
private:
  auto conn = Connection::createForServer(addr, fd, id, handler, weak_from_this());
};

// No public way to create connections - all these fail:
// auto conn = Connection::create(...);        // ERROR: no such method
// Connection conn(addr, fd, id, handler, server);  // ERROR: private constructor
// auto conn = std::make_unique<Connection>(...);   // ERROR: private constructor
```

### ConnectionHandler Implementation Patterns

#### Simple Synchronous Handler
```cpp
class HttpHandler : public ConnectionHandler {
public:
  void on_data_arrived(std::string_view data, std::unique_ptr<Connection>& conn_ptr) override {
    // Parse HTTP request using connection's arena
    ArenaAllocator& arena = conn_ptr->getArena();

    // Generate response
    conn_ptr->appendMessage("HTTP/1.1 200 OK\r\n\r\nHello World");

    // Server retains ownership
  }
};
```

#### Async Handler with Ownership Transfer
```cpp
class AsyncHandler : public ConnectionHandler {
public:
  void on_data_arrived(std::string_view data, std::unique_ptr<Connection>& conn_ptr) override {
    // Take ownership for async processing
    auto connection = std::move(conn_ptr); // conn_ptr is now null

    work_queue.push([connection = std::move(connection)](std::string_view data) mutable {
      // Process asynchronously
      connection->appendMessage("Async response");

      // Return ownership to server when done
      Server::releaseBackToServer(std::move(connection));
    });
  }
};
```

#### Batching Handler with User Data
```cpp
class BatchingHandler : public ConnectionHandler {
public:
  void on_connection_established(Connection &conn) override {
    // Allocate some protocol-specific data and attach it to the connection
    conn.user_data = new MyProtocolData();
  }

  void on_connection_closed(Connection &conn) override {
    // Free the protocol-specific data
    delete static_cast<MyProtocolData*>(conn.user_data);
  }

  void on_data_arrived(std::string_view data,
                       std::unique_ptr<Connection> &conn_ptr) override {
    // Process data and maybe store some results in the user_data
    auto* proto_data = static_cast<MyProtocolData*>(conn_ptr->user_data);
    proto_data->process(data);
  }

  void on_post_batch(std::span<std::unique_ptr<Connection>> batch) override {
    // Process a batch of connections
    for (auto& conn_ptr : batch) {
      if (conn_ptr) {
        auto* proto_data = static_cast<MyProtocolData*>(conn_ptr->user_data);
        if (proto_data->is_ready()) {
          // This connection is ready for the next stage, move it to the pipeline
          pipeline_.push(std::move(conn_ptr));
        }
      }
    }
  }

private:
  MyProcessingPipeline pipeline_;
};
```

#### Streaming "yes" Handler
```cpp
class YesHandler : public ConnectionHandler {
public:
  void on_connection_established(Connection &conn) override {
    // Write an initial "y\n"
    conn.appendMessage("y\n");
  }

  void on_write_progress(std::unique_ptr<Connection> &conn) override {
    if (conn->outgoingBytesQueued() == 0) {
      // Don't use an unbounded amount of memory
      conn->reset();
      // Write "y\n" repeatedly
      conn->appendMessage("y\n");
    }
  }
};
```

### Memory Management Patterns

#### Arena-Based String Handling
```cpp
// Preferred: Zero-copy string view with arena allocation
std::string_view process_json_key(const char* data, ArenaAllocator& arena);

// Avoid: Unnecessary string copies
std::string process_json_key(const char* data);
```

#### Safe Connection Ownership Transfer
```cpp
// In handler - take ownership for background processing
Connection* raw_conn = conn_ptr.release();

// Process on worker thread
background_processor.submit([raw_conn]() {
  // Do work...
  raw_conn->appendMessage("Background result");

  // Return to server safely (handles server destruction)
  Server::releaseBackToServer(std::unique_ptr<Connection>(raw_conn));
});
```

### Data Construction Patterns

#### Builder Pattern Usage
```cpp
CommitRequest request = CommitRequestBuilder(arena)
    .request_id("example-id")
    .leader_id("leader-123")
    .read_version(42)
    .build();
```

#### Error Handling Pattern
```cpp
enum class ParseResult { Success, InvalidJson, MissingField };
ParseResult parse_commit_request(const char* json, CommitRequest& out);
```

---

## Reference

### Build Targets

**Test Executables:**
- `test_arena_allocator` - Arena allocator functionality tests
- `test_commit_request` - JSON parsing and validation tests
- Main server executable (compiled from `src/main.cpp`)

**Benchmark Executables:**
- `bench_arena_allocator` - Arena allocator performance benchmarks
- `bench_commit_request` - JSON parsing performance benchmarks
- `bench_parser_comparison` - Comparison benchmarks vs nlohmann::json and RapidJSON

**Debug Tools:**
- `debug_arena` - Debug tool for arena allocator analysis

### Performance Characteristics

**Memory Allocation:**
- **~1ns allocation time** vs standard allocators
- **Bulk deallocation** eliminates individual free() calls
- **Optimized geometric growth** uses current block size for doubling strategy
- **Alignment-aware** allocation prevents performance penalties

**JSON Parsing:**
- **Streaming parser** handles large payloads efficiently
- **Incremental processing** suitable for network protocols
- **Arena storage** eliminates string allocation overhead
- **SIMD-accelerated base64 decoding** using simdutf for maximum performance
- **Perfect hash table** provides O(1) JSON key lookup via gperf
- **Zero hash collisions** for known JSON tokens eliminates branching

### Build Notes

- **Always use ninja** - faster and more reliable than make
- Build from project root: `mkdir -p build && cd build && cmake .. -G Ninja && ninja`
- For specific targets: `ninja <target_name>` (e.g., `ninja load_tester`)
- Always build with `-DCMAKE_EXPORT_COMPILE_COMMANDS=ON`