Organize design.md
This commit is contained in:
481
design.md
481
design.md
@@ -1,150 +1,160 @@
|
|||||||
# WeaselDB Development Guide
|
# WeaselDB Development Guide
|
||||||
|
|
||||||
## Project Summary
|
## Table of Contents
|
||||||
|
|
||||||
|
1. [Project Overview](#project-overview)
|
||||||
|
2. [Quick Start](#quick-start)
|
||||||
|
3. [Architecture](#architecture)
|
||||||
|
4. [Development Guidelines](#development-guidelines)
|
||||||
|
5. [Common Patterns](#common-patterns)
|
||||||
|
6. [Reference](#reference)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Project Overview
|
||||||
|
|
||||||
WeaselDB is a high-performance write-side database component designed for systems where reading and writing are decoupled. The system focuses exclusively on handling transactional commits with optimistic concurrency control, while readers are expected to maintain their own queryable representations by subscribing to change streams.
|
WeaselDB is a high-performance write-side database component designed for systems where reading and writing are decoupled. The system focuses exclusively on handling transactional commits with optimistic concurrency control, while readers are expected to maintain their own queryable representations by subscribing to change streams.
|
||||||
|
|
||||||
|
### Key Features
|
||||||
|
|
||||||
|
- **Ultra-fast arena allocation** (~1ns vs ~20-270ns for malloc)
|
||||||
|
- **High-performance JSON parsing** with streaming support and SIMD optimization
|
||||||
|
- **Multi-threaded networking** using epoll with thread pools
|
||||||
|
- **Zero-copy design** throughout the pipeline
|
||||||
|
- **Factory pattern safety** ensuring correct object lifecycle management
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
## Quick Start
|
## Quick Start
|
||||||
|
|
||||||
### Build System
|
### Build System
|
||||||
- Use CMake with C++20 standard
|
|
||||||
- **Always use ninja to build** (preferred over make)
|
|
||||||
- Primary build commands:
|
|
||||||
- `mkdir -p build && cd build`
|
|
||||||
- `cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Release`
|
|
||||||
- `ninja`
|
|
||||||
|
|
||||||
### Testing and Development Workflow
|
Use CMake with C++20 and **always use ninja** (preferred over make):
|
||||||
- **Run all tests**: `ninja test` or `ctest`
|
|
||||||
- **Individual test targets**:
|
|
||||||
- `./test_arena_allocator` - Arena allocator unit tests
|
|
||||||
- `./test_commit_request` - JSON parsing and validation tests
|
|
||||||
- **Benchmarking**:
|
|
||||||
- `./bench_arena_allocator` - Memory allocation performance
|
|
||||||
- `./bench_commit_request` - JSON parsing performance
|
|
||||||
- `./bench_parser_comparison` - Compare against nlohmann::json and RapidJSON
|
|
||||||
- **Debug tools**: `./debug_arena` - Analyze arena allocator behavior
|
|
||||||
|
|
||||||
### Code Style and Conventions
|
```bash
|
||||||
- **C++ Style**: Modern C++20 with RAII and move semantics
|
mkdir -p build && cd build
|
||||||
- **Memory Management**: Prefer arena allocation over standard allocators
|
cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Release -DCMAKE_EXPORT_COMPILE_COMMANDS=ON
|
||||||
- **String Handling**: Use `std::string_view` for zero-copy operations
|
ninja
|
||||||
- **Error Handling**: Return error codes or use exceptions appropriately
|
```
|
||||||
- **Naming**: snake_case for variables/functions, PascalCase for classes
|
|
||||||
- **Performance**: Always consider allocation patterns and cache locality
|
|
||||||
|
|
||||||
### Dependencies and External Libraries
|
### Testing & Development
|
||||||
- **weaseljson**: Must be installed system-wide (high-performance JSON parser)
|
|
||||||
- **simdutf**: Fetched automatically (SIMD base64 encoding/decoding)
|
|
||||||
- **toml11**: Fetched automatically (TOML configuration parsing)
|
|
||||||
- **doctest**: Fetched automatically (testing framework)
|
|
||||||
- **nanobench**: Fetched automatically (benchmarking library)
|
|
||||||
- **gperf**: System requirement for perfect hash generation
|
|
||||||
|
|
||||||
## Architecture Overview
|
**Run all tests:**
|
||||||
|
```bash
|
||||||
|
ninja test # or ctest
|
||||||
|
```
|
||||||
|
|
||||||
|
**Individual targets:**
|
||||||
|
- `./test_arena_allocator` - Arena allocator unit tests
|
||||||
|
- `./test_commit_request` - JSON parsing and validation tests
|
||||||
|
|
||||||
|
**Benchmarking:**
|
||||||
|
- `./bench_arena_allocator` - Memory allocation performance
|
||||||
|
- `./bench_commit_request` - JSON parsing performance
|
||||||
|
- `./bench_parser_comparison` - Compare vs nlohmann::json and RapidJSON
|
||||||
|
|
||||||
|
**Debug tools:**
|
||||||
|
- `./debug_arena` - Analyze arena allocator behavior
|
||||||
|
|
||||||
|
### Dependencies
|
||||||
|
|
||||||
|
**System requirements:**
|
||||||
|
- **weaseljson** - Must be installed system-wide (high-performance JSON parser)
|
||||||
|
- **gperf** - System requirement for perfect hash generation
|
||||||
|
|
||||||
|
**Auto-fetched:**
|
||||||
|
- **simdutf** - SIMD base64 encoding/decoding
|
||||||
|
- **toml11** - TOML configuration parsing
|
||||||
|
- **doctest** - Testing framework
|
||||||
|
- **nanobench** - Benchmarking library
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
### Core Components
|
### Core Components
|
||||||
|
|
||||||
#### 1. **Arena Allocator** (`src/arena_allocator.hpp`)
|
#### **Arena Allocator** (`src/arena_allocator.hpp`)
|
||||||
- **Ultra-fast memory allocator** (~1ns per allocation vs ~20-270ns for malloc)
|
|
||||||
|
Ultra-fast memory allocator optimized for request/response patterns:
|
||||||
|
|
||||||
|
- **~1ns allocation time** vs ~20-270ns for malloc
|
||||||
- **Lazy initialization** with geometric block growth (doubling strategy)
|
- **Lazy initialization** with geometric block growth (doubling strategy)
|
||||||
- **Intrusive linked list design** for minimal memory overhead
|
- **Intrusive linked list design** for minimal memory overhead
|
||||||
- **Memory-efficient reset** that keeps the first block and frees others
|
- **Memory-efficient reset** that keeps the first block and frees others
|
||||||
- **STL-compatible interface** via `ArenaStlAllocator`
|
- **STL-compatible interface** via `ArenaStlAllocator`
|
||||||
|
- **O(1) amortized allocation** with proper alignment handling
|
||||||
|
- **Move semantics** for efficient transfers
|
||||||
|
- **Thread-safe per-connection** usage via exclusive ownership model
|
||||||
|
|
||||||
Key features:
|
#### **Networking Layer**
|
||||||
- O(1) amortized allocation
|
|
||||||
- Proper alignment handling for all types
|
|
||||||
- Move semantics for efficient transfers
|
|
||||||
- Requires trivially destructible types only
|
|
||||||
- Thread-safe per-connection usage via exclusive ownership model
|
|
||||||
|
|
||||||
#### 2. **Commit Request Data Model** (`src/commit_request.hpp`)
|
**Server** (`src/server.{hpp,cpp}`):
|
||||||
|
- **High-performance multi-threaded networking** using epoll with thread pools
|
||||||
|
- **Factory pattern construction** via `Server::create()` ensures proper shared_ptr semantics
|
||||||
|
- **Safe shutdown mechanism** with async-signal-safe shutdown() method
|
||||||
|
- **Connection ownership management** with automatic cleanup on server destruction
|
||||||
|
- **Pluggable protocol handlers** via ConnectionHandler interface
|
||||||
|
- **Multi-threaded architecture:** separate accept and network thread pools
|
||||||
|
- **EPOLL_EXCLUSIVE** load balancing across accept threads
|
||||||
|
|
||||||
|
**Connection** (`src/connection.{hpp,cpp}`):
|
||||||
|
- **Efficient per-connection state management** with arena-based memory allocation
|
||||||
|
- **Safe ownership transfer** between server threads and protocol handlers
|
||||||
|
- **Automatic cleanup** on connection closure or server shutdown
|
||||||
|
- **Handler interface isolation** - only exposes necessary methods to protocol handlers
|
||||||
|
- **Protocol-specific data:** `user_data` `void*` for custom handler data
|
||||||
|
|
||||||
|
**ConnectionHandler Interface** (`src/connection_handler.hpp`):
|
||||||
|
- **Abstract protocol interface** decoupling networking from application logic
|
||||||
|
- **Ownership transfer support** allowing handlers to take connections for async processing
|
||||||
|
- **Streaming data processing** with partial message handling
|
||||||
|
- **Connection lifecycle hooks** for initialization and cleanup
|
||||||
|
|
||||||
|
#### **Parsing Layer**
|
||||||
|
|
||||||
|
**JSON Commit Request Parser** (`src/json_commit_request_parser.{hpp,cpp}`):
|
||||||
|
- **High-performance JSON parser** using `weaseljson` library
|
||||||
|
- **Streaming parser support** for incremental parsing of network data
|
||||||
|
- **gperf-optimized token recognition** for fast JSON key parsing
|
||||||
|
- **Base64 decoding** using SIMD-accelerated simdutf
|
||||||
|
- **Comprehensive validation** of transaction structure
|
||||||
|
- **Perfect hash table lookup** for JSON keys using gperf
|
||||||
|
- **Zero hash collisions** for known JSON tokens eliminates branching
|
||||||
|
|
||||||
|
**Parser Interface** (`src/commit_request_parser.hpp`):
|
||||||
|
- **Abstract base class** for commit request parsers
|
||||||
|
- **Format-agnostic parsing interface** supporting multiple serialization formats
|
||||||
|
- **Streaming and one-shot parsing modes**
|
||||||
|
- **Standardized error handling** across parser implementations
|
||||||
|
|
||||||
|
#### **Data Model**
|
||||||
|
|
||||||
|
**Commit Request Data Model** (`src/commit_request.hpp`):
|
||||||
- **Format-agnostic data structure** for representing transactional commits
|
- **Format-agnostic data structure** for representing transactional commits
|
||||||
- **Arena-backed string storage** with efficient memory management
|
- **Arena-backed string storage** with efficient memory management
|
||||||
- **Move-only semantics** for optimal performance
|
- **Move-only semantics** for optimal performance
|
||||||
- **Builder pattern** for constructing commit requests
|
- **Builder pattern** for constructing commit requests
|
||||||
- **Zero-copy string views** pointing to arena-allocated memory
|
- **Zero-copy string views** pointing to arena-allocated memory
|
||||||
|
|
||||||
#### 3. **JSON Commit Request Parser** (`src/json_commit_request_parser.{hpp,cpp}`)
|
#### **Configuration & Optimization**
|
||||||
- **High-performance JSON parser** using `weaseljson` library
|
|
||||||
- **Streaming parser support** for incremental parsing of network data
|
|
||||||
- **gperf-optimized token recognition** for fast JSON key parsing
|
|
||||||
- **Base64 decoding** using SIMD-accelerated simdutf
|
|
||||||
- **Comprehensive validation** of transaction structure
|
|
||||||
|
|
||||||
Parser capabilities:
|
**Configuration System** (`src/config.{hpp,cpp}`):
|
||||||
- One-shot parsing for complete JSON
|
|
||||||
- Streaming parsing for network protocols
|
|
||||||
- Parse state management with error recovery
|
|
||||||
- Memory-efficient string views backed by arena storage
|
|
||||||
- Perfect hash table lookup for JSON keys using gperf
|
|
||||||
|
|
||||||
#### 4. **Parser Interface** (`src/commit_request_parser.hpp`)
|
|
||||||
- **Abstract base class** for commit request parsers
|
|
||||||
- **Format-agnostic parsing interface** supporting multiple serialization formats
|
|
||||||
- **Streaming and one-shot parsing modes**
|
|
||||||
- **Standardized error handling** across parser implementations
|
|
||||||
|
|
||||||
#### 5. **Configuration System** (`src/config.{hpp,cpp}`)
|
|
||||||
- **TOML-based configuration** using `toml11` library
|
- **TOML-based configuration** using `toml11` library
|
||||||
- **Structured configuration** with server, commit, and subscription sections
|
- **Structured configuration** with server, commit, and subscription sections
|
||||||
- **Default fallback values** for all configuration options
|
- **Default fallback values** for all configuration options
|
||||||
- **Type-safe parsing** with validation and bounds checking
|
- **Type-safe parsing** with validation and bounds checking
|
||||||
- **Comprehensive validation** with meaningful error messages
|
- See `config.md` for complete configuration documentation
|
||||||
|
|
||||||
See `config.md` for complete configuration documentation.
|
**JSON Token Optimization** (`src/json_tokens.gperf`, `src/json_token_enum.hpp`):
|
||||||
|
|
||||||
#### 6. **JSON Token Optimization** (`src/json_tokens.gperf`, `src/json_token_enum.hpp`)
|
|
||||||
- **Perfect hash table** generated by gperf for O(1) JSON key lookup
|
- **Perfect hash table** generated by gperf for O(1) JSON key lookup
|
||||||
- **Compile-time token enumeration** for type-safe key identification
|
- **Compile-time token enumeration** for type-safe key identification
|
||||||
- **Minimal perfect hash** reduces memory overhead and improves cache locality
|
- **Minimal perfect hash** reduces memory overhead and improves cache locality
|
||||||
- **Build-time code generation** ensures optimal performance
|
- **Build-time code generation** ensures optimal performance
|
||||||
|
|
||||||
#### 7. **Server** (`src/server.{hpp,cpp}`)
|
### Transaction Data Model
|
||||||
- **High-performance multi-threaded networking** using epoll with thread pools
|
|
||||||
- **Factory pattern construction** via `Server::create()` ensures proper shared_ptr semantics
|
|
||||||
- **Safe shutdown mechanism** with async-signal-safe shutdown() method
|
|
||||||
- **Connection ownership management** with automatic cleanup on server destruction
|
|
||||||
- **Pluggable protocol handlers** via ConnectionHandler interface
|
|
||||||
|
|
||||||
Key features:
|
#### CommitRequest Structure
|
||||||
- Multi-threaded architecture: separate accept and network thread pools
|
|
||||||
- EPOLL_EXCLUSIVE load balancing across accept threads
|
|
||||||
- Connection lifecycle safety with weak_ptr references
|
|
||||||
- Graceful shutdown with proper resource cleanup
|
|
||||||
- RAII-based connection management with unique_ptr ownership
|
|
||||||
|
|
||||||
#### 8. **Connection** (`src/connection.{hpp,cpp}`)
|
|
||||||
- **Efficient per-connection state management** with arena-based memory allocation
|
|
||||||
- **Safe ownership transfer** between server threads and protocol handlers
|
|
||||||
- **Automatic cleanup** on connection closure or server shutdown
|
|
||||||
- **Handler interface isolation** - only exposes necessary methods to protocol handlers
|
|
||||||
|
|
||||||
Key features:
|
|
||||||
- Arena allocator per connection for efficient memory management
|
|
||||||
- Weak reference to server for safe cleanup after server destruction
|
|
||||||
- Private networking details accessible only to Server via friend relationship
|
|
||||||
- Public handler interface: appendMessage(), closeAfterSend(), getArena(), getId()
|
|
||||||
- Thread-safe ownership transfer with Server::releaseBackToServer()
|
|
||||||
- **Protocol-specific data**: `user_data` `void*` for custom handler data.
|
|
||||||
|
|
||||||
#### 9. **ConnectionHandler Interface** (`src/connection_handler.hpp`)
|
|
||||||
- **Abstract protocol interface** decoupling networking from application logic
|
|
||||||
- **Ownership transfer support** allowing handlers to take connections for async processing
|
|
||||||
- **Streaming data processing** with partial message handling
|
|
||||||
- **Connection lifecycle hooks** for initialization and cleanup
|
|
||||||
|
|
||||||
Key features:
|
|
||||||
- on_data_arrived()/on_write_progress() with unique_ptr<Connection>& for ownership transfer
|
|
||||||
- on_connection_established/closed() hooks for protocol state management
|
|
||||||
- on_post_batch(): Hook for batch processing of connections after I/O
|
|
||||||
- Zero-copy data processing with arena allocator integration
|
|
||||||
- Thread-safe ownership transfer via Server::releaseBackToServer()
|
|
||||||
|
|
||||||
### Data Model
|
|
||||||
|
|
||||||
#### Transaction Structure
|
|
||||||
```
|
```
|
||||||
CommitRequest {
|
CommitRequest {
|
||||||
- request_id: Optional unique identifier
|
- request_id: Optional unique identifier
|
||||||
@@ -160,15 +170,9 @@ CommitRequest {
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
#### Memory Management
|
### Memory Management Model
|
||||||
- **Arena-based allocation** ensures efficient bulk memory management per connection
|
|
||||||
- **String views** eliminate unnecessary copying of JSON data
|
|
||||||
- **Zero-copy design** for binary data handling
|
|
||||||
- **RAII-based connection lifecycle** with automatic cleanup on destruction
|
|
||||||
- **Safe ownership transfer** between server threads and protocol handlers
|
|
||||||
- **Weak reference safety** prevents crashes when connections outlive server
|
|
||||||
|
|
||||||
Connection Ownership Model:
|
#### Connection Ownership Lifecycle
|
||||||
1. **Creation**: Accept threads create connections, transfer to epoll as raw pointers
|
1. **Creation**: Accept threads create connections, transfer to epoll as raw pointers
|
||||||
2. **Processing**: Network threads claim ownership by wrapping in unique_ptr
|
2. **Processing**: Network threads claim ownership by wrapping in unique_ptr
|
||||||
3. **Handler Transfer**: Handlers can take ownership for async processing via unique_ptr.release()
|
3. **Handler Transfer**: Handlers can take ownership for async processing via unique_ptr.release()
|
||||||
@@ -176,84 +180,38 @@ Connection Ownership Model:
|
|||||||
5. **Safety**: All transfers use weak_ptr to server for safe cleanup
|
5. **Safety**: All transfers use weak_ptr to server for safe cleanup
|
||||||
6. **Cleanup**: RAII ensures proper resource cleanup in all scenarios
|
6. **Cleanup**: RAII ensures proper resource cleanup in all scenarios
|
||||||
|
|
||||||
Arena Memory Lifecycle:
|
#### Arena Memory Lifecycle
|
||||||
1. **Request Processing**: Handler uses `conn->getArena()` to allocate memory for parsing request data
|
1. **Request Processing**: Handler uses `conn->getArena()` to allocate memory for parsing request data
|
||||||
2. **Response Generation**: Handler uses arena for temporary response construction (headers, JSON, etc.)
|
2. **Response Generation**: Handler uses arena for temporary response construction (headers, JSON, etc.)
|
||||||
3. **Response Queuing**: Handler calls `conn->appendMessage()` which copies data to arena-backed message queue
|
3. **Response Queuing**: Handler calls `conn->appendMessage()` which copies data to arena-backed message queue
|
||||||
4. **Response Writing**: Server writes all queued messages to socket via `writeBytes()`
|
4. **Response Writing**: Server writes all queued messages to socket via `writeBytes()`
|
||||||
|
|
||||||
The user is responsible for calling `conn->reset()` periodically to reclaim the
|
> **Note**: Call `conn->reset()` periodically to reclaim arena memory. Best practice is after all outgoing bytes have been written.
|
||||||
arena memory. A good time to do this for a request/response protocol would be
|
|
||||||
after all outgoing bytes have been written.
|
|
||||||
|
|
||||||
### API Design
|
### API Endpoints
|
||||||
|
|
||||||
The system implements a RESTful API with the following core endpoints:
|
The system implements a RESTful API:
|
||||||
|
|
||||||
1. **GET /v1/version**: Retrieve current committed version and leader
|
- **GET /v1/version** - Retrieve current committed version and leader
|
||||||
2. **POST /v1/commit**: Submit transactional operations
|
- **POST /v1/commit** - Submit transactional operations
|
||||||
3. **GET /v1/subscribe**: Stream committed transactions
|
- **GET /v1/subscribe** - Stream committed transactions
|
||||||
4. **GET /v1/status**: Check commit status by request_id
|
- **GET /v1/status** - Check commit status by request_id
|
||||||
5. **PUT /v1/retention/<policy_id>**: Creates or updates a retention policy
|
- **PUT /v1/retention/<policy_id>** - Creates or updates a retention policy
|
||||||
6. **GET /v1/retention/<policy_id>**: Retrieves a retention policy by ID
|
- **GET /v1/retention/<policy_id>** - Retrieves a retention policy by ID
|
||||||
7. **GET /v1/retention/**: Retrieves all retention policies
|
- **GET /v1/retention/** - Retrieves all retention policies
|
||||||
8. **DELETE /v1/retention/<policy_id>**: Removes a retention policy
|
- **DELETE /v1/retention/<policy_id>** - Removes a retention policy
|
||||||
9. **GET /metrics**: Retrieves server metrics for monitoring
|
- **GET /metrics** - Retrieves server metrics for monitoring
|
||||||
|
|
||||||
### Performance Characteristics
|
|
||||||
|
|
||||||
#### Memory Allocation
|
|
||||||
- **~1ns allocation time** vs standard allocators
|
|
||||||
- **Bulk deallocation** eliminates individual free() calls
|
|
||||||
- **Optimized geometric growth** uses current block size for doubling strategy
|
|
||||||
- **Alignment-aware** allocation prevents performance penalties
|
|
||||||
|
|
||||||
#### JSON Parsing
|
|
||||||
- **Streaming parser** handles large payloads efficiently
|
|
||||||
- **Incremental processing** suitable for network protocols
|
|
||||||
- **Arena storage** eliminates string allocation overhead
|
|
||||||
- **SIMD-accelerated base64 decoding** using simdutf for maximum performance
|
|
||||||
- **Perfect hash table** provides O(1) JSON key lookup via gperf
|
|
||||||
- **Zero hash collisions** for known JSON tokens eliminates branching
|
|
||||||
|
|
||||||
### Design Principles
|
### Design Principles
|
||||||
|
|
||||||
1. **Performance-first**: Every component optimized for high throughput
|
1. **Performance-first** - Every component optimized for high throughput
|
||||||
2. **Memory efficiency**: Arena allocation eliminates fragmentation
|
2. **Memory efficiency** - Arena allocation eliminates fragmentation
|
||||||
3. **Zero-copy**: Minimize data copying throughout pipeline
|
3. **Zero-copy** - Minimize data copying throughout pipeline
|
||||||
4. **Streaming-ready**: Support incremental processing
|
4. **Streaming-ready** - Support incremental processing
|
||||||
5. **Type safety**: Compile-time validation where possible
|
5. **Type safety** - Compile-time validation where possible
|
||||||
6. **Resource management**: RAII and move semantics throughout
|
6. **Resource management** - RAII and move semantics throughout
|
||||||
|
|
||||||
### Testing & Benchmarking
|
### Future Integration Points
|
||||||
|
|
||||||
The project includes comprehensive testing infrastructure:
|
|
||||||
- **Unit tests** using doctest framework
|
|
||||||
- **Performance benchmarks** using nanobench
|
|
||||||
- **Memory allocation benchmarks** for arena performance
|
|
||||||
- **JSON parsing validation** for correctness
|
|
||||||
|
|
||||||
Build targets:
|
|
||||||
- `test_arena_allocator`: Arena allocator functionality tests
|
|
||||||
- `test_commit_request`: JSON parsing and validation tests
|
|
||||||
- Main server executable (compiled from `src/main.cpp`)
|
|
||||||
- `bench_arena_allocator`: Arena allocator performance benchmarks
|
|
||||||
- `bench_commit_request`: JSON parsing performance benchmarks
|
|
||||||
- `bench_parser_comparison`: Comparison benchmarks vs nlohmann::json and RapidJSON
|
|
||||||
- `debug_arena`: Debug tool for arena allocator analysis
|
|
||||||
|
|
||||||
### Dependencies
|
|
||||||
|
|
||||||
- **weaseljson**: High-performance streaming JSON parser
|
|
||||||
- **simdutf**: SIMD-accelerated UTF-8 validation and base64 encoding/decoding
|
|
||||||
- **toml11**: TOML configuration file parsing
|
|
||||||
- **doctest**: Lightweight testing framework
|
|
||||||
- **nanobench**: Micro-benchmarking library
|
|
||||||
- **gperf**: Perfect hash function generator for JSON token optimization
|
|
||||||
- **nlohmann::json**: Reference JSON parser for benchmarking comparisons
|
|
||||||
- **RapidJSON**: High-performance JSON parser for benchmarking comparisons
|
|
||||||
|
|
||||||
### Future Considerations
|
|
||||||
|
|
||||||
This write-side component is designed to integrate with:
|
This write-side component is designed to integrate with:
|
||||||
- **Leader election** systems for distributed consensus
|
- **Leader election** systems for distributed consensus
|
||||||
@@ -261,11 +219,21 @@ This write-side component is designed to integrate with:
|
|||||||
- **Read-side systems** that consume the transaction stream
|
- **Read-side systems** that consume the transaction stream
|
||||||
- **Monitoring** systems for operational visibility
|
- **Monitoring** systems for operational visibility
|
||||||
|
|
||||||
The modular design allows each component to be optimized independently while maintaining clear interfaces for system integration.
|
---
|
||||||
|
|
||||||
## Development Guidelines
|
## Development Guidelines
|
||||||
|
|
||||||
### Important Implementation Details
|
### Code Style & Conventions
|
||||||
|
|
||||||
|
- **C++ Style**: Modern C++20 with RAII and move semantics
|
||||||
|
- **Memory Management**: Prefer arena allocation over standard allocators
|
||||||
|
- **String Handling**: Use `std::string_view` for zero-copy operations
|
||||||
|
- **Error Handling**: Return error codes or use exceptions appropriately
|
||||||
|
- **Naming**: snake_case for variables/functions, PascalCase for classes
|
||||||
|
- **Performance**: Always consider allocation patterns and cache locality
|
||||||
|
|
||||||
|
### Critical Implementation Rules
|
||||||
|
|
||||||
- **Server Creation**: Always use `Server::create()` factory method - direct construction is impossible
|
- **Server Creation**: Always use `Server::create()` factory method - direct construction is impossible
|
||||||
- **Connection Creation**: Only the Server can create connections - no public constructor or factory method
|
- **Connection Creation**: Only the Server can create connections - no public constructor or factory method
|
||||||
- **Connection Ownership**: Use unique_ptr semantics for safe ownership transfer between components
|
- **Connection Ownership**: Use unique_ptr semantics for safe ownership transfer between components
|
||||||
@@ -274,55 +242,48 @@ The modular design allows each component to be optimized independently while mai
|
|||||||
- **Ownership Transfer**: Use `Server::releaseBackToServer()` for returning connections to server from handlers
|
- **Ownership Transfer**: Use `Server::releaseBackToServer()` for returning connections to server from handlers
|
||||||
- **JSON Token Lookup**: Use the gperf-generated perfect hash table in `json_tokens.hpp` for O(1) key recognition
|
- **JSON Token Lookup**: Use the gperf-generated perfect hash table in `json_tokens.hpp` for O(1) key recognition
|
||||||
- **Base64 Handling**: Always use simdutf for base64 encoding/decoding for performance
|
- **Base64 Handling**: Always use simdutf for base64 encoding/decoding for performance
|
||||||
- **Error Propagation**: Use structured error types that can be efficiently returned up the call stack
|
|
||||||
- **Thread Safety**: Connection ownership transfers are designed to be thread-safe with proper RAII cleanup
|
- **Thread Safety**: Connection ownership transfers are designed to be thread-safe with proper RAII cleanup
|
||||||
|
|
||||||
### File Organization
|
### Project Structure
|
||||||
- **Core Headers**: `src/` contains all primary implementation files
|
|
||||||
- **Tests**: `tests/` contains doctest-based unit tests
|
|
||||||
- **Benchmarks**: `benchmarks/` contains nanobench performance tests
|
|
||||||
- **Tools**: `tools/` contains debugging and analysis utilities
|
|
||||||
- **Build-Generated**: `build/` contains CMake-generated files including `json_tokens.cpp`
|
|
||||||
|
|
||||||
### Adding New Protocol Handlers
|
- **`src/`** - Core headers and implementation files
|
||||||
- Inherit from `ConnectionHandler` in `src/connection_handler.hpp`
|
- **`tests/`** - doctest-based unit tests
|
||||||
- Implement `on_data_arrived()` with proper ownership semantics
|
- **`benchmarks/`** - nanobench performance tests
|
||||||
- Use connection's arena allocator for temporary allocations: `conn->getArena()`
|
- **`tools/`** - Debugging and analysis utilities
|
||||||
- Handle partial messages and streaming protocols appropriately
|
- **`build/`** - CMake-generated files including `json_tokens.cpp`
|
||||||
- Use `Server::releaseBackToServer()` if taking ownership for async processing
|
|
||||||
- Add corresponding test cases and integration tests
|
|
||||||
- Consider performance implications of ownership transfers
|
|
||||||
|
|
||||||
### Adding New Parsers
|
### Extension Points
|
||||||
- Inherit from `CommitRequestParser` in `src/commit_request_parser.hpp`
|
|
||||||
- Implement both streaming and one-shot parsing modes
|
#### Adding New Protocol Handlers
|
||||||
- Use arena allocation for all temporary string storage
|
1. Inherit from `ConnectionHandler` in `src/connection_handler.hpp`
|
||||||
- Add corresponding test cases in `tests/`
|
2. Implement `on_data_arrived()` with proper ownership semantics
|
||||||
- Add benchmark comparisons in `benchmarks/`
|
3. Use connection's arena allocator for temporary allocations: `conn->getArena()`
|
||||||
|
4. Handle partial messages and streaming protocols appropriately
|
||||||
|
5. Use `Server::releaseBackToServer()` if taking ownership for async processing
|
||||||
|
6. Add corresponding test cases and integration tests
|
||||||
|
7. Consider performance implications of ownership transfers
|
||||||
|
|
||||||
|
#### Adding New Parsers
|
||||||
|
1. Inherit from `CommitRequestParser` in `src/commit_request_parser.hpp`
|
||||||
|
2. Implement both streaming and one-shot parsing modes
|
||||||
|
3. Use arena allocation for all temporary string storage
|
||||||
|
4. Add corresponding test cases in `tests/`
|
||||||
|
5. Add benchmark comparisons in `benchmarks/`
|
||||||
|
|
||||||
|
### Performance Guidelines
|
||||||
|
|
||||||
### Performance Considerations
|
|
||||||
- **Memory**: Arena allocation eliminates fragmentation - use it for all request-scoped data
|
- **Memory**: Arena allocation eliminates fragmentation - use it for all request-scoped data
|
||||||
- **CPU**: Perfect hashing and SIMD operations are critical paths - avoid alternatives
|
- **CPU**: Perfect hashing and SIMD operations are critical paths - avoid alternatives
|
||||||
- **I/O**: Streaming parser design supports incremental network data processing
|
- **I/O**: Streaming parser design supports incremental network data processing
|
||||||
- **Cache**: String views avoid copying, keeping data cache-friendly
|
- **Cache**: String views avoid copying, keeping data cache-friendly
|
||||||
|
|
||||||
### Configuration Management
|
### Configuration & Testing
|
||||||
- All configuration is TOML-based using `config.toml`
|
|
||||||
- Comprehensive documentation available in `config.md`
|
|
||||||
- Type-safe parsing with validation and bounds checking
|
|
||||||
- Always validate configuration values and provide meaningful errors
|
|
||||||
|
|
||||||
### Testing Strategy
|
- **Configuration**: All configuration is TOML-based using `config.toml` (see `config.md`)
|
||||||
- **Unit tests** validate individual component correctness
|
- **Testing Strategy**: Run unit tests, benchmarks, and debug tools before submitting changes
|
||||||
- **Benchmarks** ensure performance characteristics are maintained
|
- **Build System**: CMake generates gperf hash tables at build time; always use ninja
|
||||||
- **Debug tools** help analyze memory usage patterns
|
|
||||||
- Always run both tests and benchmarks before submitting changes
|
|
||||||
|
|
||||||
### Build System Details
|
---
|
||||||
- CMake generates gperf hash tables at build time
|
|
||||||
- **Always use ninja** - preferred over make for faster incremental builds
|
|
||||||
- Release builds include debug symbols for profiling
|
|
||||||
- All external dependencies except weaseljson are auto-fetched
|
|
||||||
|
|
||||||
## Common Patterns
|
## Common Patterns
|
||||||
|
|
||||||
@@ -448,7 +409,9 @@ public:
|
|||||||
};
|
};
|
||||||
```
|
```
|
||||||
|
|
||||||
### Arena-Based String Handling
|
### Memory Management Patterns
|
||||||
|
|
||||||
|
#### Arena-Based String Handling
|
||||||
```cpp
|
```cpp
|
||||||
// Preferred: Zero-copy string view with arena allocation
|
// Preferred: Zero-copy string view with arena allocation
|
||||||
std::string_view process_json_key(const char* data, ArenaAllocator& arena);
|
std::string_view process_json_key(const char* data, ArenaAllocator& arena);
|
||||||
@@ -457,22 +420,7 @@ std::string_view process_json_key(const char* data, ArenaAllocator& arena);
|
|||||||
std::string process_json_key(const char* data);
|
std::string process_json_key(const char* data);
|
||||||
```
|
```
|
||||||
|
|
||||||
### Error Handling Pattern
|
#### Safe Connection Ownership Transfer
|
||||||
```cpp
|
|
||||||
enum class ParseResult { Success, InvalidJson, MissingField };
|
|
||||||
ParseResult parse_commit_request(const char* json, CommitRequest& out);
|
|
||||||
```
|
|
||||||
|
|
||||||
### Builder Pattern Usage
|
|
||||||
```cpp
|
|
||||||
CommitRequest request = CommitRequestBuilder(arena)
|
|
||||||
.request_id("example-id")
|
|
||||||
.leader_id("leader-123")
|
|
||||||
.read_version(42)
|
|
||||||
.build();
|
|
||||||
```
|
|
||||||
|
|
||||||
### Safe Connection Ownership Transfer
|
|
||||||
```cpp
|
```cpp
|
||||||
// In handler - take ownership for background processing
|
// In handler - take ownership for background processing
|
||||||
Connection* raw_conn = conn_ptr.release();
|
Connection* raw_conn = conn_ptr.release();
|
||||||
@@ -487,8 +435,61 @@ background_processor.submit([raw_conn]() {
|
|||||||
});
|
});
|
||||||
```
|
```
|
||||||
|
|
||||||
## Development Notes
|
### Data Construction Patterns
|
||||||
|
|
||||||
- **Always use ninja to build** - faster and more reliable than make
|
#### Builder Pattern Usage
|
||||||
|
```cpp
|
||||||
|
CommitRequest request = CommitRequestBuilder(arena)
|
||||||
|
.request_id("example-id")
|
||||||
|
.leader_id("leader-123")
|
||||||
|
.read_version(42)
|
||||||
|
.build();
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Error Handling Pattern
|
||||||
|
```cpp
|
||||||
|
enum class ParseResult { Success, InvalidJson, MissingField };
|
||||||
|
ParseResult parse_commit_request(const char* json, CommitRequest& out);
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Reference
|
||||||
|
|
||||||
|
### Build Targets
|
||||||
|
|
||||||
|
**Test Executables:**
|
||||||
|
- `test_arena_allocator` - Arena allocator functionality tests
|
||||||
|
- `test_commit_request` - JSON parsing and validation tests
|
||||||
|
- Main server executable (compiled from `src/main.cpp`)
|
||||||
|
|
||||||
|
**Benchmark Executables:**
|
||||||
|
- `bench_arena_allocator` - Arena allocator performance benchmarks
|
||||||
|
- `bench_commit_request` - JSON parsing performance benchmarks
|
||||||
|
- `bench_parser_comparison` - Comparison benchmarks vs nlohmann::json and RapidJSON
|
||||||
|
|
||||||
|
**Debug Tools:**
|
||||||
|
- `debug_arena` - Debug tool for arena allocator analysis
|
||||||
|
|
||||||
|
### Performance Characteristics
|
||||||
|
|
||||||
|
**Memory Allocation:**
|
||||||
|
- **~1ns allocation time** vs standard allocators
|
||||||
|
- **Bulk deallocation** eliminates individual free() calls
|
||||||
|
- **Optimized geometric growth** uses current block size for doubling strategy
|
||||||
|
- **Alignment-aware** allocation prevents performance penalties
|
||||||
|
|
||||||
|
**JSON Parsing:**
|
||||||
|
- **Streaming parser** handles large payloads efficiently
|
||||||
|
- **Incremental processing** suitable for network protocols
|
||||||
|
- **Arena storage** eliminates string allocation overhead
|
||||||
|
- **SIMD-accelerated base64 decoding** using simdutf for maximum performance
|
||||||
|
- **Perfect hash table** provides O(1) JSON key lookup via gperf
|
||||||
|
- **Zero hash collisions** for known JSON tokens eliminates branching
|
||||||
|
|
||||||
|
### Build Notes
|
||||||
|
|
||||||
|
- **Always use ninja** - faster and more reliable than make
|
||||||
- Build from project root: `mkdir -p build && cd build && cmake .. -G Ninja && ninja`
|
- Build from project root: `mkdir -p build && cd build && cmake .. -G Ninja && ninja`
|
||||||
- For specific targets: `ninja <target_name>` (e.g., `ninja load_tester`)
|
- For specific targets: `ninja <target_name>` (e.g., `ninja load_tester`)
|
||||||
|
- Always build with `-DCMAKE_EXPORT_COMPILE_COMMANDS=ON`
|
||||||
Reference in New Issue
Block a user