diff --git a/design.md b/design.md index 86f3dd6..30de3ba 100644 --- a/design.md +++ b/design.md @@ -21,11 +21,18 @@ Key features: - Move semantics for efficient transfers - Requires trivially destructible types only -#### 2. **Commit Request Parser** (`src/commit_request.{hpp,cpp}`) +#### 2. **Commit Request Data Model** (`src/commit_request.hpp`) +- **Format-agnostic data structure** for representing transactional commits +- **Arena-backed string storage** with efficient memory management +- **Move-only semantics** for optimal performance +- **Builder pattern** for constructing commit requests +- **Zero-copy string views** pointing to arena-allocated memory + +#### 3. **JSON Commit Request Parser** (`src/json_commit_request_parser.{hpp,cpp}`) - **High-performance JSON parser** using `weaseljson` library - **Streaming parser support** for incremental parsing of network data -- **Arena-based string storage** for zero-copy string handling -- **Base64 decoding** for binary key/value data +- **gperf-optimized token recognition** for fast JSON key parsing +- **Base64 decoding** using SIMD-accelerated simdutf - **Comprehensive validation** of transaction structure Parser capabilities: @@ -33,8 +40,15 @@ Parser capabilities: - Streaming parsing for network protocols - Parse state management with error recovery - Memory-efficient string views backed by arena storage +- Perfect hash table lookup for JSON keys using gperf -#### 3. **Configuration System** (`src/config.{hpp,cpp}`) +#### 4. **Parser Interface** (`src/parser_interface.hpp`) +- **Abstract base class** for commit request parsers +- **Format-agnostic parsing interface** supporting multiple serialization formats +- **Streaming and one-shot parsing modes** +- **Standardized error handling** across parser implementations + +#### 5. **Configuration System** (`src/config.{hpp,cpp}`) - **TOML-based configuration** using `toml11` library - **Structured configuration** with server, commit, and subscription sections - **Default fallback values** for all configuration options @@ -45,6 +59,12 @@ Configuration domains: - **Commit**: request ID validation, retention policies - **Subscription**: buffer management, keepalive intervals +#### 6. **JSON Token Optimization** (`src/json_tokens.gperf`, `src/json_token_enum.hpp`) +- **Perfect hash table** generated by gperf for O(1) JSON key lookup +- **Compile-time token enumeration** for type-safe key identification +- **Minimal perfect hash** reduces memory overhead and improves cache locality +- **Build-time code generation** ensures optimal performance + ### Data Model #### Transaction Structure @@ -91,6 +111,8 @@ The system implements a RESTful API with three core endpoints: - **Incremental processing** suitable for network protocols - **Arena storage** eliminates string allocation overhead - **SIMD-accelerated base64 decoding** using simdutf for maximum performance +- **Perfect hash table** provides O(1) JSON key lookup via gperf +- **Zero hash collisions** for known JSON tokens eliminates branching ### Design Principles @@ -113,7 +135,10 @@ Build targets: - `test_arena_allocator`: Arena allocator functionality tests - `test_commit_request`: JSON parsing and validation tests - `weaseldb`: Main application demonstrating configuration and parsing -- Various benchmark executables for performance testing +- `bench_arena_allocator`: Arena allocator performance benchmarks +- `bench_commit_request`: JSON parsing performance benchmarks +- `bench_parser_comparison`: Comparison benchmarks vs nlohmann::json and RapidJSON +- `debug_arena`: Debug tool for arena allocator analysis ### Dependencies @@ -122,6 +147,9 @@ Build targets: - **toml11**: TOML configuration file parsing - **doctest**: Lightweight testing framework - **nanobench**: Micro-benchmarking library +- **gperf**: Perfect hash function generator for JSON token optimization +- **nlohmann::json**: Reference JSON parser for benchmarking comparisons +- **RapidJSON**: High-performance JSON parser for benchmarking comparisons ### Future Considerations