Files

Andrew Noyes 05ee8e05f8 Update design.md

2025-08-17 15:32:54 -04:00

7.0 KiB

Raw Blame History

WeaselDB Design Overview

Project Summary

WeaselDB is a high-performance write-side database component designed for systems where reading and writing are decoupled. The system focuses exclusively on handling transactional commits with optimistic concurrency control, while readers are expected to maintain their own queryable representations by subscribing to change streams.

Architecture Overview

Core Components

1. Arena Allocator (`src/arena_allocator.hpp`)

Ultra-fast memory allocator (~1ns per allocation vs ~20-270ns for malloc)
Lazy initialization with geometric block growth (doubling strategy)
Intrusive linked list design for minimal memory overhead
Memory-efficient reset that keeps the first block and frees others
STL-compatible interface via ArenaStlAllocator

Key features:

O(1) amortized allocation
Proper alignment handling for all types
Move semantics for efficient transfers
Requires trivially destructible types only

2. Commit Request Data Model (`src/commit_request.hpp`)

Format-agnostic data structure for representing transactional commits
Arena-backed string storage with efficient memory management
Move-only semantics for optimal performance
Builder pattern for constructing commit requests
Zero-copy string views pointing to arena-allocated memory

3. JSON Commit Request Parser (`src/json_commit_request_parser.{hpp,cpp}`)

High-performance JSON parser using weaseljson library
Streaming parser support for incremental parsing of network data
gperf-optimized token recognition for fast JSON key parsing
Base64 decoding using SIMD-accelerated simdutf
Comprehensive validation of transaction structure

Parser capabilities:

One-shot parsing for complete JSON
Streaming parsing for network protocols
Parse state management with error recovery
Memory-efficient string views backed by arena storage
Perfect hash table lookup for JSON keys using gperf

4. Parser Interface (`src/parser_interface.hpp`)

Abstract base class for commit request parsers
Format-agnostic parsing interface supporting multiple serialization formats
Streaming and one-shot parsing modes
Standardized error handling across parser implementations

5. Configuration System (`src/config.{hpp,cpp}`)

TOML-based configuration using toml11 library
Structured configuration with server, commit, and subscription sections
Default fallback values for all configuration options
Type-safe parsing with validation

Configuration domains:

Server: bind address, port, request size limits
Commit: request ID validation, retention policies
Subscription: buffer management, keepalive intervals

6. JSON Token Optimization (`src/json_tokens.gperf`, `src/json_token_enum.hpp`)

Perfect hash table generated by gperf for O(1) JSON key lookup
Compile-time token enumeration for type-safe key identification
Minimal perfect hash reduces memory overhead and improves cache locality
Build-time code generation ensures optimal performance

Data Model

Transaction Structure

CommitRequest {
  - request_id: Optional unique identifier
  - leader_id: Expected leader for consistency
  - read_version: Snapshot version for preconditions
  - preconditions[]: Optimistic concurrency checks
    - point_read: Single key existence/content validation
    - range_read: Range-based consistency validation
  - operations[]: Ordered mutation operations
    - write: Set key-value pair
    - delete: Remove single key
    - range_delete: Remove key range
}

Memory Management

Arena-based allocation ensures efficient bulk memory management
String views eliminate unnecessary copying of JSON data
Zero-copy design for binary data handling
Automatic memory cleanup on transaction completion

API Design

The system implements a RESTful API with three core endpoints:

GET /v1/version: Retrieve current committed version and leader
POST /v1/commit: Submit transactional operations
GET /v1/subscribe: Stream committed transactions (implied)
GET /v1/status: Check commit status by request_id (implied)

Performance Characteristics

Memory Allocation

~1ns allocation time vs standard allocators
Bulk deallocation eliminates individual free() calls
Optimized geometric growth uses current block size for doubling strategy
Alignment-aware allocation prevents performance penalties

JSON Parsing

Streaming parser handles large payloads efficiently
Incremental processing suitable for network protocols
Arena storage eliminates string allocation overhead
SIMD-accelerated base64 decoding using simdutf for maximum performance
Perfect hash table provides O(1) JSON key lookup via gperf
Zero hash collisions for known JSON tokens eliminates branching

Design Principles

Performance-first: Every component optimized for high throughput
Memory efficiency: Arena allocation eliminates fragmentation
Zero-copy: Minimize data copying throughout pipeline
Streaming-ready: Support incremental processing
Type safety: Compile-time validation where possible
Resource management: RAII and move semantics throughout

Testing & Benchmarking

The project includes comprehensive testing infrastructure:

Unit tests using doctest framework
Performance benchmarks using nanobench
Memory allocation benchmarks for arena performance
JSON parsing validation for correctness

Build targets:

test_arena_allocator: Arena allocator functionality tests
test_commit_request: JSON parsing and validation tests
weaseldb: Main application demonstrating configuration and parsing
bench_arena_allocator: Arena allocator performance benchmarks
bench_commit_request: JSON parsing performance benchmarks
bench_parser_comparison: Comparison benchmarks vs nlohmann::json and RapidJSON
debug_arena: Debug tool for arena allocator analysis

Dependencies

weaseljson: High-performance streaming JSON parser
simdutf: SIMD-accelerated UTF-8 validation and base64 encoding/decoding
toml11: TOML configuration file parsing
doctest: Lightweight testing framework
nanobench: Micro-benchmarking library
gperf: Perfect hash function generator for JSON token optimization
nlohmann::json: Reference JSON parser for benchmarking comparisons
RapidJSON: High-performance JSON parser for benchmarking comparisons

Future Considerations

This write-side component is designed to integrate with:

Leader election systems for distributed consensus
Replication mechanisms for fault tolerance
Read-side systems that consume the transaction stream
Monitoring systems for operational visibility

The modular design allows each component to be optimized independently while maintaining clear interfaces for system integration.

7.0 KiB Raw Blame History