Add mdformat pre-commit hook

This commit is contained in:
2025-09-12 11:24:16 -04:00
parent 9d48caca76
commit bf90b8856a
9 changed files with 286 additions and 120 deletions

122
design.md
View File

@@ -3,15 +3,15 @@
## Table of Contents
1. [Project Overview](#project-overview)
2. [Quick Start](#quick-start)
3. [Architecture](#architecture)
4. [Development Guidelines](#development-guidelines)
5. [Common Patterns](#common-patterns)
6. [Reference](#reference)
1. [Quick Start](#quick-start)
1. [Architecture](#architecture)
1. [Development Guidelines](#development-guidelines)
1. [Common Patterns](#common-patterns)
1. [Reference](#reference)
**IMPORTANT:** Read [style.md](style.md) first - contains mandatory C++ coding standards, threading rules, and testing guidelines that must be followed for all code changes.
---
______________________________________________________________________
## Project Overview
@@ -26,7 +26,7 @@ WeaselDB is a high-performance write-side database component designed for system
- **Optimized memory management** with arena allocation and efficient copying
- **Factory pattern safety** ensuring correct object lifecycle management
---
______________________________________________________________________
## Quick Start
@@ -43,11 +43,13 @@ ninja
### Testing & Development
**Run all tests:**
```bash
ninja test # or ctest
```
**Individual targets:**
- `./test_arena` - Arena allocator unit tests
- `./test_commit_request` - JSON parsing and validation tests
- `./test_http_handler` - HTTP protocol handling tests
@@ -56,6 +58,7 @@ ninja test # or ctest
- `./test_server_connection_return` - Connection lifecycle tests
**Benchmarking:**
- `./bench_arena` - Memory allocation performance
- `./bench_commit_request` - JSON parsing performance
- `./bench_parser_comparison` - Compare vs nlohmann::json and RapidJSON
@@ -64,18 +67,22 @@ ninja test # or ctest
- `./bench_format_comparison` - String formatting performance
**Debug tools:**
- `./debug_arena` - Analyze arena allocator behavior
**Load Testing:**
- `./load_tester` - A tool to generate load against the server for performance and stability analysis.
### Dependencies
**System requirements:**
- **weaseljson** - Must be installed system-wide (high-performance JSON parser)
- **gperf** - System requirement for perfect hash generation
**Auto-fetched:**
- **simdutf** - SIMD base64 encoding/decoding
- **toml11** - TOML configuration parsing
- **doctest** - Testing framework
@@ -84,7 +91,7 @@ ninja test # or ctest
- **RapidJSON** - High-performance JSON library (used in benchmarks)
- **llhttp** - Fast HTTP parser
---
______________________________________________________________________
## Architecture
@@ -106,6 +113,7 @@ Ultra-fast memory allocator optimized for request/response patterns:
#### **Networking Layer**
**Server** (`src/server.{hpp,cpp}`):
- **High-performance multi-threaded networking** using multiple epoll instances with unified I/O thread pool
- **Configurable epoll instances** to eliminate kernel-level epoll_ctl contention (default: 2, max: io_threads)
- **Round-robin thread-to-epoll assignment** distributes I/O threads across epoll instances
@@ -117,6 +125,7 @@ Ultra-fast memory allocator optimized for request/response patterns:
- **EPOLL_EXCLUSIVE** on listen socket across all epoll instances prevents thundering herd
**Connection** (`src/connection.{hpp,cpp}`):
- **Efficient per-connection state management** with arena-based memory allocation
- **Safe ownership transfer** between server threads and protocol handlers
- **Automatic cleanup** on connection closure or server shutdown
@@ -124,6 +133,7 @@ Ultra-fast memory allocator optimized for request/response patterns:
- **Protocol-specific data:** `user_data` `void*` for custom handler data
**ConnectionHandler Interface** (`src/connection_handler.hpp`):
- **Abstract protocol interface** decoupling networking from application logic
- **Ownership transfer support** allowing handlers to take connections for async processing
- **Streaming data processing** with partial message handling
@@ -141,6 +151,7 @@ A high-performance, multi-stage, lock-free pipeline for inter-thread communicati
#### **Parsing Layer**
**JSON Commit Request Parser** (`src/json_commit_request_parser.{hpp,cpp}`):
- **High-performance JSON parser** using `weaseljson` library
- **Streaming parser support** for incremental parsing of network data
- **gperf-optimized token recognition** for fast JSON key parsing
@@ -150,6 +161,7 @@ A high-performance, multi-stage, lock-free pipeline for inter-thread communicati
- **Zero hash collisions** for known JSON tokens eliminates branching
**Parser Interface** (`src/commit_request_parser.hpp`):
- **Abstract base class** for commit request parsers
- **Format-agnostic parsing interface** supporting multiple serialization formats
- **Streaming and one-shot parsing modes**
@@ -158,6 +170,7 @@ A high-performance, multi-stage, lock-free pipeline for inter-thread communicati
#### **Data Model**
**Commit Request Data Model** (`src/commit_request.hpp`):
- **Format-agnostic data structure** for representing transactional commits
- **Arena-backed string storage** with efficient memory management
- **Move-only semantics** for optimal performance
@@ -167,18 +180,21 @@ A high-performance, multi-stage, lock-free pipeline for inter-thread communicati
#### **Metrics System** (`src/metric.{hpp,cpp}`)
**High-Performance Metrics Implementation:**
- **Thread-local counters/histograms** with single writer for performance
- **Global gauges** with lock-free atomic CAS operations for multi-writer scenarios
- **SIMD-optimized histogram bucket updates** using AVX instructions for high throughput
- **Arena allocator integration** for efficient memory management during rendering
**Threading Model:**
- **Counters**: Per-thread storage, single writer, atomic write in `Counter::inc()`, atomic read in render thread
- **Histograms**: Per-thread storage, single writer, per-histogram mutex serializes all access (observe and render)
- **Gauges**: Lock-free atomic operations using `std::bit_cast` for double precision
- **Thread cleanup**: Automatic accumulation of thread-local state into global state on destruction
**Prometheus Compatibility:**
- **Standard metric types** with proper label handling and validation
- **Bucket generation helpers** for linear/exponential histogram distributions
- **Callback-based metrics** for dynamic values
@@ -187,6 +203,7 @@ A high-performance, multi-stage, lock-free pipeline for inter-thread communicati
#### **Configuration & Optimization**
**Configuration System** (`src/config.{hpp,cpp}`):
- **TOML-based configuration** using `toml11` library
- **Structured configuration** with server, commit, and subscription sections
- **Default fallback values** for all configuration options
@@ -194,6 +211,7 @@ A high-performance, multi-stage, lock-free pipeline for inter-thread communicati
- See `config.md` for complete configuration documentation
**JSON Token Optimization** (`src/json_tokens.gperf`, `src/json_token_enum.hpp`):
- **Perfect hash table** generated by gperf for O(1) JSON key lookup
- **Compile-time token enumeration** for type-safe key identification
- **Minimal perfect hash** reduces memory overhead and improves cache locality
@@ -202,6 +220,7 @@ A high-performance, multi-stage, lock-free pipeline for inter-thread communicati
### Transaction Data Model
#### CommitRequest Structure
```
CommitRequest {
- request_id: Optional unique identifier
@@ -220,18 +239,20 @@ CommitRequest {
### Memory Management Model
#### Connection Ownership Lifecycle
1. **Creation**: Accept threads create connections, transfer to epoll as raw pointers
2. **Processing**: Network threads claim ownership by wrapping in unique_ptr
3. **Handler Transfer**: Handlers can take ownership for async processing via unique_ptr.release()
4. **Return Path**: Handlers use Server::release_back_to_server() to return connections
5. **Safety**: All transfers use weak_ptr to server for safe cleanup
6. **Cleanup**: RAII ensures proper resource cleanup in all scenarios
1. **Processing**: Network threads claim ownership by wrapping in unique_ptr
1. **Handler Transfer**: Handlers can take ownership for async processing via unique_ptr.release()
1. **Return Path**: Handlers use Server::release_back_to_server() to return connections
1. **Safety**: All transfers use weak_ptr to server for safe cleanup
1. **Cleanup**: RAII ensures proper resource cleanup in all scenarios
#### Arena Memory Lifecycle
1. **Request Processing**: Handler uses `conn->get_arena()` to allocate memory for parsing request data
2. **Response Generation**: Handler uses arena for temporary response construction (headers, JSON, etc.)
3. **Response Queuing**: Handler calls `conn->append_message()` which copies data to arena-backed message queue
4. **Response Writing**: Server writes all queued messages to socket via `writeBytes()`
1. **Response Generation**: Handler uses arena for temporary response construction (headers, JSON, etc.)
1. **Response Queuing**: Handler calls `conn->append_message()` which copies data to arena-backed message queue
1. **Response Writing**: Server writes all queued messages to socket via `writeBytes()`
> **Note**: Call `conn->reset()` periodically to reclaim arena memory. Best practice is after all outgoing bytes have been written.
@@ -241,22 +262,26 @@ CommitRequest {
WeaselDB uses `EPOLLONESHOT` for all connection file descriptors to enable safe multi-threaded ownership transfer without complex synchronization:
**Key Benefits:**
1. **Automatic fd disarming** - When epoll triggers an event, the fd is automatically removed from epoll monitoring
2. **Race-free ownership transfer** - Handlers can safely take connection ownership and move to other threads
3. **Zero-coordination async processing** - No manual synchronization needed between network threads and handler threads
1. **Race-free ownership transfer** - Handlers can safely take connection ownership and move to other threads
1. **Zero-coordination async processing** - No manual synchronization needed between network threads and handler threads
**Threading Flow:**
1. **Event Trigger**: Network thread gets epoll event → connection auto-disarmed via ONESHOT
2. **Safe Transfer**: Handler can take ownership (`std::move(conn_ptr)`) with no epoll interference
3. **Async Processing**: Connection processed on handler thread while epoll cannot trigger spurious events
4. **Return & Re-arm**: `Server::receiveConnectionBack()` re-arms fd with `epoll_ctl(EPOLL_CTL_MOD)`
1. **Safe Transfer**: Handler can take ownership (`std::move(conn_ptr)`) with no epoll interference
1. **Async Processing**: Connection processed on handler thread while epoll cannot trigger spurious events
1. **Return & Re-arm**: `Server::receiveConnectionBack()` re-arms fd with `epoll_ctl(EPOLL_CTL_MOD)`
**Performance Trade-off:**
- **Cost**: One `epoll_ctl(MOD)` syscall per connection return (~100-200ns)
- **Benefit**: Eliminates complex thread synchronization and prevents race conditions
- **Alternative cost**: Manual `EPOLL_CTL_DEL`/`ADD` + locking would be significantly higher
**Without EPOLLONESHOT risks:**
- Multiple threads processing same fd simultaneously
- Use-after-move when network thread accesses transferred connection
- Complex synchronization between epoll events and ownership transfers
@@ -270,14 +295,14 @@ The system implements a RESTful API. See [api.md](api.md) for comprehensive API
### Design Principles
1. **Performance-first** - Every component optimized for high throughput
2. **Scalable concurrency** - Multiple epoll instances eliminate kernel contention
3. **Memory efficiency** - Arena allocation eliminates fragmentation
4. **Efficient copying** - Minimize unnecessary copies while accepting required ones
5. **Streaming-ready** - Support incremental processing
6. **Type safety** - Compile-time validation where possible
7. **Resource management** - RAII and move semantics throughout
1. **Scalable concurrency** - Multiple epoll instances eliminate kernel contention
1. **Memory efficiency** - Arena allocation eliminates fragmentation
1. **Efficient copying** - Minimize unnecessary copies while accepting required ones
1. **Streaming-ready** - Support incremental processing
1. **Type safety** - Compile-time validation where possible
1. **Resource management** - RAII and move semantics throughout
---
______________________________________________________________________
## Development Guidelines
@@ -308,20 +333,22 @@ See [style.md](style.md) for comprehensive C++ coding standards and conventions.
### Extension Points
#### Adding New Protocol Handlers
1. Inherit from `ConnectionHandler` in `src/connection_handler.hpp`
2. Implement `on_data_arrived()` with proper ownership semantics
3. Use connection's arena allocator for temporary allocations: `conn->get_arena()`
4. Handle partial messages and streaming protocols appropriately
5. Use `Server::release_back_to_server()` if taking ownership for async processing
6. Add corresponding test cases and integration tests
7. Consider performance implications of ownership transfers
1. Implement `on_data_arrived()` with proper ownership semantics
1. Use connection's arena allocator for temporary allocations: `conn->get_arena()`
1. Handle partial messages and streaming protocols appropriately
1. Use `Server::release_back_to_server()` if taking ownership for async processing
1. Add corresponding test cases and integration tests
1. Consider performance implications of ownership transfers
#### Adding New Parsers
1. Inherit from `CommitRequestParser` in `src/commit_request_parser.hpp`
2. Implement both streaming and one-shot parsing modes
3. Use arena allocation for all temporary string storage
4. Add corresponding test cases in `tests/`
5. Add benchmark comparisons in `benchmarks/`
1. Implement both streaming and one-shot parsing modes
1. Use arena allocation for all temporary string storage
1. Add corresponding test cases in `tests/`
1. Add benchmark comparisons in `benchmarks/`
### Performance Guidelines
@@ -337,13 +364,14 @@ See [style.md](style.md) for comprehensive C++ coding standards and conventions.
- **Build System**: CMake generates gperf hash tables at build time
- **Testing Guidelines**: See [style.md](style.md) for comprehensive testing standards including synchronization rules
---
______________________________________________________________________
## Common Patterns
### Factory Method Patterns
#### Server Creation
```cpp
// Server must be created via factory method
auto server = Server::create(config, handler);
@@ -354,6 +382,7 @@ auto server = Server::create(config, handler);
```
#### Connection Creation (Server-Only)
```cpp
// Only Server can create connections (using private friend method)
class Server {
@@ -370,6 +399,7 @@ private:
### ConnectionHandler Implementation Patterns
#### Simple Synchronous Handler
```cpp
class HttpHandler : public ConnectionHandler {
public:
@@ -386,6 +416,7 @@ public:
```
#### Async Handler with Ownership Transfer
```cpp
class AsyncHandler : public ConnectionHandler {
public:
@@ -405,6 +436,7 @@ public:
```
#### Batching Handler with User Data
```cpp
class BatchingHandler : public ConnectionHandler {
public:
@@ -444,6 +476,7 @@ private:
```
#### Streaming "yes" Handler
```cpp
class YesHandler : public ConnectionHandler {
public:
@@ -466,6 +499,7 @@ public:
### Memory Management Patterns
#### Arena-Based String Handling
```cpp
// Preferred: String view with arena allocation to minimize copying
std::string_view process_json_key(const char* data, Arena& arena);
@@ -475,6 +509,7 @@ std::string process_json_key(const char* data);
```
#### Safe Connection Ownership Transfer
```cpp
// In handler - take ownership for background processing
Connection* raw_conn = conn_ptr.release();
@@ -492,6 +527,7 @@ background_processor.submit([raw_conn]() {
### Data Construction Patterns
#### Builder Pattern Usage
```cpp
CommitRequest request = CommitRequestBuilder(arena)
.request_id("example-id")
@@ -501,41 +537,47 @@ CommitRequest request = CommitRequestBuilder(arena)
```
#### Error Handling Pattern
```cpp
enum class ParseResult { Success, InvalidJson, MissingField };
ParseResult parse_commit_request(const char* json, CommitRequest& out);
```
---
______________________________________________________________________
## Reference
### Build Targets
**Test Executables:**
- `test_arena` - Arena allocator functionality tests
- `test_commit_request` - JSON parsing and validation tests
- `test_metric` - Metrics system functionality tests
- Main server executable (compiled from `src/main.cpp`)
**Benchmark Executables:**
- `bench_arena` - Arena allocator performance benchmarks
- `bench_commit_request` - JSON parsing performance benchmarks
- `bench_parser_comparison` - Comparison benchmarks vs nlohmann::json and RapidJSON
- `bench_metric` - Metrics system performance benchmarks
**Debug Tools:**
- `debug_arena` - Debug tool for arena allocator analysis
### Performance Characteristics
**Memory Allocation:**
- **~1ns allocation time** vs standard allocators
- **Bulk deallocation** eliminates individual free() calls
- **Optimized geometric growth** uses current block size for doubling strategy
- **Alignment-aware** allocation prevents performance penalties
**JSON Parsing:**
- **Streaming parser** handles large payloads efficiently
- **Incremental processing** suitable for network protocols
- **Arena storage** eliminates string allocation overhead