11 KiB
11 KiB
WeaselDB Development Guide
Project Summary
WeaselDB is a high-performance write-side database component designed for systems where reading and writing are decoupled. The system focuses exclusively on handling transactional commits with optimistic concurrency control, while readers are expected to maintain their own queryable representations by subscribing to change streams.
Quick Start
Build System
- Use CMake with C++20 standard
- Primary build commands:
mkdir -p build && cd buildcmake .. -DCMAKE_BUILD_TYPE=Releaseninjaormake -j$(nproc)
Testing and Development Workflow
- Run all tests:
ninja testorctest - Individual test targets:
./test_arena_allocator- Arena allocator unit tests./test_commit_request- JSON parsing and validation tests
- Benchmarking:
./bench_arena_allocator- Memory allocation performance./bench_commit_request- JSON parsing performance./bench_parser_comparison- Compare against nlohmann::json and RapidJSON
- Debug tools:
./debug_arena- Analyze arena allocator behavior
Code Style and Conventions
- C++ Style: Modern C++20 with RAII and move semantics
- Memory Management: Prefer arena allocation over standard allocators
- String Handling: Use
std::string_viewfor zero-copy operations - Error Handling: Return error codes or use exceptions appropriately
- Naming: snake_case for variables/functions, PascalCase for classes
- Performance: Always consider allocation patterns and cache locality
Dependencies and External Libraries
- weaseljson: Must be installed system-wide (high-performance JSON parser)
- simdutf: Fetched automatically (SIMD base64 encoding/decoding)
- toml11: Fetched automatically (TOML configuration parsing)
- doctest: Fetched automatically (testing framework)
- nanobench: Fetched automatically (benchmarking library)
- gperf: System requirement for perfect hash generation
Architecture Overview
Core Components
1. Arena Allocator (src/arena_allocator.hpp)
- Ultra-fast memory allocator (~1ns per allocation vs ~20-270ns for malloc)
- Lazy initialization with geometric block growth (doubling strategy)
- Intrusive linked list design for minimal memory overhead
- Memory-efficient reset that keeps the first block and frees others
- STL-compatible interface via
ArenaStlAllocator
Key features:
- O(1) amortized allocation
- Proper alignment handling for all types
- Move semantics for efficient transfers
- Requires trivially destructible types only
2. Commit Request Data Model (src/commit_request.hpp)
- Format-agnostic data structure for representing transactional commits
- Arena-backed string storage with efficient memory management
- Move-only semantics for optimal performance
- Builder pattern for constructing commit requests
- Zero-copy string views pointing to arena-allocated memory
3. JSON Commit Request Parser (src/json_commit_request_parser.{hpp,cpp})
- High-performance JSON parser using
weaseljsonlibrary - Streaming parser support for incremental parsing of network data
- gperf-optimized token recognition for fast JSON key parsing
- Base64 decoding using SIMD-accelerated simdutf
- Comprehensive validation of transaction structure
Parser capabilities:
- One-shot parsing for complete JSON
- Streaming parsing for network protocols
- Parse state management with error recovery
- Memory-efficient string views backed by arena storage
- Perfect hash table lookup for JSON keys using gperf
4. Parser Interface (src/commit_request_parser.hpp)
- Abstract base class for commit request parsers
- Format-agnostic parsing interface supporting multiple serialization formats
- Streaming and one-shot parsing modes
- Standardized error handling across parser implementations
5. Configuration System (src/config.{hpp,cpp})
- TOML-based configuration using
toml11library - Structured configuration with server, commit, and subscription sections
- Default fallback values for all configuration options
- Type-safe parsing with validation and bounds checking
- Comprehensive validation with meaningful error messages
See config.md for complete configuration documentation.
6. JSON Token Optimization (src/json_tokens.gperf, src/json_token_enum.hpp)
- Perfect hash table generated by gperf for O(1) JSON key lookup
- Compile-time token enumeration for type-safe key identification
- Minimal perfect hash reduces memory overhead and improves cache locality
- Build-time code generation ensures optimal performance
Data Model
Transaction Structure
CommitRequest {
- request_id: Optional unique identifier
- leader_id: Expected leader for consistency
- read_version: Snapshot version for preconditions
- preconditions[]: Optimistic concurrency checks
- point_read: Single key existence/content validation
- range_read: Range-based consistency validation
- operations[]: Ordered mutation operations
- write: Set key-value pair
- delete: Remove single key
- range_delete: Remove key range
}
Memory Management
- Arena-based allocation ensures efficient bulk memory management
- String views eliminate unnecessary copying of JSON data
- Zero-copy design for binary data handling
- Automatic memory cleanup on transaction completion
API Design
The system implements a RESTful API with three core endpoints:
- GET /v1/version: Retrieve current committed version and leader
- POST /v1/commit: Submit transactional operations
- GET /v1/subscribe: Stream committed transactions (implied)
- GET /v1/status: Check commit status by request_id (implied)
Performance Characteristics
Memory Allocation
- ~1ns allocation time vs standard allocators
- Bulk deallocation eliminates individual free() calls
- Optimized geometric growth uses current block size for doubling strategy
- Alignment-aware allocation prevents performance penalties
JSON Parsing
- Streaming parser handles large payloads efficiently
- Incremental processing suitable for network protocols
- Arena storage eliminates string allocation overhead
- SIMD-accelerated base64 decoding using simdutf for maximum performance
- Perfect hash table provides O(1) JSON key lookup via gperf
- Zero hash collisions for known JSON tokens eliminates branching
Design Principles
- Performance-first: Every component optimized for high throughput
- Memory efficiency: Arena allocation eliminates fragmentation
- Zero-copy: Minimize data copying throughout pipeline
- Streaming-ready: Support incremental processing
- Type safety: Compile-time validation where possible
- Resource management: RAII and move semantics throughout
Testing & Benchmarking
The project includes comprehensive testing infrastructure:
- Unit tests using doctest framework
- Performance benchmarks using nanobench
- Memory allocation benchmarks for arena performance
- JSON parsing validation for correctness
Build targets:
test_arena_allocator: Arena allocator functionality teststest_commit_request: JSON parsing and validation tests- Main server executable (compiled from
src/main.cpp) bench_arena_allocator: Arena allocator performance benchmarksbench_commit_request: JSON parsing performance benchmarksbench_parser_comparison: Comparison benchmarks vs nlohmann::json and RapidJSONdebug_arena: Debug tool for arena allocator analysis
Dependencies
- weaseljson: High-performance streaming JSON parser
- simdutf: SIMD-accelerated UTF-8 validation and base64 encoding/decoding
- toml11: TOML configuration file parsing
- doctest: Lightweight testing framework
- nanobench: Micro-benchmarking library
- gperf: Perfect hash function generator for JSON token optimization
- nlohmann::json: Reference JSON parser for benchmarking comparisons
- RapidJSON: High-performance JSON parser for benchmarking comparisons
Future Considerations
This write-side component is designed to integrate with:
- Leader election systems for distributed consensus
- Replication mechanisms for fault tolerance
- Read-side systems that consume the transaction stream
- Monitoring systems for operational visibility
The modular design allows each component to be optimized independently while maintaining clear interfaces for system integration.
Development Guidelines
Important Implementation Details
- Arena Allocator Pattern: Always use
ArenaAllocatorfor temporary allocations within request processing - String View Usage: Prefer
std::string_viewoverstd::stringwhen pointing to arena-allocated memory - JSON Token Lookup: Use the gperf-generated perfect hash table in
json_tokens.hppfor O(1) key recognition - Base64 Handling: Always use simdutf for base64 encoding/decoding for performance
- Error Propagation: Use structured error types that can be efficiently returned up the call stack
File Organization
- Core Headers:
src/contains all primary implementation files - Tests:
tests/contains doctest-based unit tests - Benchmarks:
benchmarks/contains nanobench performance tests - Tools:
tools/contains debugging and analysis utilities - Build-Generated:
build/contains CMake-generated files includingjson_tokens.cpp
Adding New Parsers
- Inherit from
CommitRequestParserinsrc/commit_request_parser.hpp - Implement both streaming and one-shot parsing modes
- Use arena allocation for all temporary string storage
- Add corresponding test cases in
tests/ - Add benchmark comparisons in
benchmarks/
Performance Considerations
- Memory: Arena allocation eliminates fragmentation - use it for all request-scoped data
- CPU: Perfect hashing and SIMD operations are critical paths - avoid alternatives
- I/O: Streaming parser design supports incremental network data processing
- Cache: String views avoid copying, keeping data cache-friendly
Configuration Management
- All configuration is TOML-based using
config.toml - Comprehensive documentation available in
config.md - Type-safe parsing with validation and bounds checking
- Always validate configuration values and provide meaningful errors
Testing Strategy
- Unit tests validate individual component correctness
- Benchmarks ensure performance characteristics are maintained
- Debug tools help analyze memory usage patterns
- Always run both tests and benchmarks before submitting changes
Build System Details
- CMake generates gperf hash tables at build time
- Ninja is preferred over make for faster incremental builds
- Release builds include debug symbols for profiling
- All external dependencies except weaseljson are auto-fetched
Common Patterns
Arena-Based String Handling
// Preferred: Zero-copy string view
std::string_view process_json_key(const char* data, ArenaAllocator& arena);
// Avoid: Unnecessary string copies
std::string process_json_key(const char* data);
Error Handling Pattern
enum class ParseResult { Success, InvalidJson, MissingField };
ParseResult parse_commit_request(const char* json, CommitRequest& out);
Builder Pattern Usage
CommitRequest request = CommitRequestBuilder(arena)
.request_id("example-id")
.leader_id("leader-123")
.read_version(42)
.build();