163 lines
7.0 KiB
Markdown
163 lines
7.0 KiB
Markdown
# WeaselDB Design Overview
|
|
|
|
## Project Summary
|
|
|
|
WeaselDB is a high-performance write-side database component designed for systems where reading and writing are decoupled. The system focuses exclusively on handling transactional commits with optimistic concurrency control, while readers are expected to maintain their own queryable representations by subscribing to change streams.
|
|
|
|
## Architecture Overview
|
|
|
|
### Core Components
|
|
|
|
#### 1. **Arena Allocator** (`src/arena_allocator.hpp`)
|
|
- **Ultra-fast memory allocator** (~1ns per allocation vs ~20-270ns for malloc)
|
|
- **Lazy initialization** with geometric block growth (doubling strategy)
|
|
- **Intrusive linked list design** for minimal memory overhead
|
|
- **Memory-efficient reset** that keeps the first block and frees others
|
|
- **STL-compatible interface** via `ArenaStlAllocator`
|
|
|
|
Key features:
|
|
- O(1) amortized allocation
|
|
- Proper alignment handling for all types
|
|
- Move semantics for efficient transfers
|
|
- Requires trivially destructible types only
|
|
|
|
#### 2. **Commit Request Data Model** (`src/commit_request.hpp`)
|
|
- **Format-agnostic data structure** for representing transactional commits
|
|
- **Arena-backed string storage** with efficient memory management
|
|
- **Move-only semantics** for optimal performance
|
|
- **Builder pattern** for constructing commit requests
|
|
- **Zero-copy string views** pointing to arena-allocated memory
|
|
|
|
#### 3. **JSON Commit Request Parser** (`src/json_commit_request_parser.{hpp,cpp}`)
|
|
- **High-performance JSON parser** using `weaseljson` library
|
|
- **Streaming parser support** for incremental parsing of network data
|
|
- **gperf-optimized token recognition** for fast JSON key parsing
|
|
- **Base64 decoding** using SIMD-accelerated simdutf
|
|
- **Comprehensive validation** of transaction structure
|
|
|
|
Parser capabilities:
|
|
- One-shot parsing for complete JSON
|
|
- Streaming parsing for network protocols
|
|
- Parse state management with error recovery
|
|
- Memory-efficient string views backed by arena storage
|
|
- Perfect hash table lookup for JSON keys using gperf
|
|
|
|
#### 4. **Parser Interface** (`src/parser_interface.hpp`)
|
|
- **Abstract base class** for commit request parsers
|
|
- **Format-agnostic parsing interface** supporting multiple serialization formats
|
|
- **Streaming and one-shot parsing modes**
|
|
- **Standardized error handling** across parser implementations
|
|
|
|
#### 5. **Configuration System** (`src/config.{hpp,cpp}`)
|
|
- **TOML-based configuration** using `toml11` library
|
|
- **Structured configuration** with server, commit, and subscription sections
|
|
- **Default fallback values** for all configuration options
|
|
- **Type-safe parsing** with validation
|
|
|
|
Configuration domains:
|
|
- **Server**: bind address, port, request size limits
|
|
- **Commit**: request ID validation, retention policies
|
|
- **Subscription**: buffer management, keepalive intervals
|
|
|
|
#### 6. **JSON Token Optimization** (`src/json_tokens.gperf`, `src/json_token_enum.hpp`)
|
|
- **Perfect hash table** generated by gperf for O(1) JSON key lookup
|
|
- **Compile-time token enumeration** for type-safe key identification
|
|
- **Minimal perfect hash** reduces memory overhead and improves cache locality
|
|
- **Build-time code generation** ensures optimal performance
|
|
|
|
### Data Model
|
|
|
|
#### Transaction Structure
|
|
```
|
|
CommitRequest {
|
|
- request_id: Optional unique identifier
|
|
- leader_id: Expected leader for consistency
|
|
- read_version: Snapshot version for preconditions
|
|
- preconditions[]: Optimistic concurrency checks
|
|
- point_read: Single key existence/content validation
|
|
- range_read: Range-based consistency validation
|
|
- operations[]: Ordered mutation operations
|
|
- write: Set key-value pair
|
|
- delete: Remove single key
|
|
- range_delete: Remove key range
|
|
}
|
|
```
|
|
|
|
#### Memory Management
|
|
- **Arena-based allocation** ensures efficient bulk memory management
|
|
- **String views** eliminate unnecessary copying of JSON data
|
|
- **Zero-copy design** for binary data handling
|
|
- **Automatic memory cleanup** on transaction completion
|
|
|
|
### API Design
|
|
|
|
The system implements a RESTful API with three core endpoints:
|
|
|
|
1. **GET /v1/version**: Retrieve current committed version and leader
|
|
2. **POST /v1/commit**: Submit transactional operations
|
|
3. **GET /v1/subscribe**: Stream committed transactions (implied)
|
|
4. **GET /v1/status**: Check commit status by request_id (implied)
|
|
|
|
### Performance Characteristics
|
|
|
|
#### Memory Allocation
|
|
- **~1ns allocation time** vs standard allocators
|
|
- **Bulk deallocation** eliminates individual free() calls
|
|
- **Optimized geometric growth** uses current block size for doubling strategy
|
|
- **Alignment-aware** allocation prevents performance penalties
|
|
|
|
#### JSON Parsing
|
|
- **Streaming parser** handles large payloads efficiently
|
|
- **Incremental processing** suitable for network protocols
|
|
- **Arena storage** eliminates string allocation overhead
|
|
- **SIMD-accelerated base64 decoding** using simdutf for maximum performance
|
|
- **Perfect hash table** provides O(1) JSON key lookup via gperf
|
|
- **Zero hash collisions** for known JSON tokens eliminates branching
|
|
|
|
### Design Principles
|
|
|
|
1. **Performance-first**: Every component optimized for high throughput
|
|
2. **Memory efficiency**: Arena allocation eliminates fragmentation
|
|
3. **Zero-copy**: Minimize data copying throughout pipeline
|
|
4. **Streaming-ready**: Support incremental processing
|
|
5. **Type safety**: Compile-time validation where possible
|
|
6. **Resource management**: RAII and move semantics throughout
|
|
|
|
### Testing & Benchmarking
|
|
|
|
The project includes comprehensive testing infrastructure:
|
|
- **Unit tests** using doctest framework
|
|
- **Performance benchmarks** using nanobench
|
|
- **Memory allocation benchmarks** for arena performance
|
|
- **JSON parsing validation** for correctness
|
|
|
|
Build targets:
|
|
- `test_arena_allocator`: Arena allocator functionality tests
|
|
- `test_commit_request`: JSON parsing and validation tests
|
|
- `weaseldb`: Main application demonstrating configuration and parsing
|
|
- `bench_arena_allocator`: Arena allocator performance benchmarks
|
|
- `bench_commit_request`: JSON parsing performance benchmarks
|
|
- `bench_parser_comparison`: Comparison benchmarks vs nlohmann::json and RapidJSON
|
|
- `debug_arena`: Debug tool for arena allocator analysis
|
|
|
|
### Dependencies
|
|
|
|
- **weaseljson**: High-performance streaming JSON parser
|
|
- **simdutf**: SIMD-accelerated UTF-8 validation and base64 encoding/decoding
|
|
- **toml11**: TOML configuration file parsing
|
|
- **doctest**: Lightweight testing framework
|
|
- **nanobench**: Micro-benchmarking library
|
|
- **gperf**: Perfect hash function generator for JSON token optimization
|
|
- **nlohmann::json**: Reference JSON parser for benchmarking comparisons
|
|
- **RapidJSON**: High-performance JSON parser for benchmarking comparisons
|
|
|
|
### Future Considerations
|
|
|
|
This write-side component is designed to integrate with:
|
|
- **Leader election** systems for distributed consensus
|
|
- **Replication** mechanisms for fault tolerance
|
|
- **Read-side systems** that consume the transaction stream
|
|
- **Monitoring** systems for operational visibility
|
|
|
|
The modular design allows each component to be optimized independently while maintaining clear interfaces for system integration.
|