Update /ok to serve dual health check/benchmarking role

This commit is contained in:
2025-09-05 12:39:10 -04:00
parent 761eaa552b
commit e67e4aee17
15 changed files with 265 additions and 53 deletions

View File

@@ -2,30 +2,30 @@
## Summary
WeaselDB achieved 1.3M requests/second throughput using a two-stage ThreadPipeline with futex wake optimization, delivering 550ns serial CPU time per request while maintaining 0% CPU usage when idle. Higher serial CPU time means more CPU budget available for serial processing.
WeaselDB's /ok health check endpoint achieves 1M requests/second with 650ns of configurable CPU work per request through the 4-stage commit pipeline, while maintaining 0% CPU usage when idle. The configurable CPU work serves both as a health check (validating the full pipeline) and as a benchmarking tool for measuring per-request processing capacity.
## Performance Metrics
### Throughput
- **1.3M requests/second** over unix socket
- **1.0M requests/second** /ok health check endpoint (4-stage commit pipeline)
- 8 I/O threads with 8 epoll instances
- Load tester used 12 network threads
- Max latency: 4ms out of 90M requests
- **0% CPU usage when idle** (optimized futex wake implementation)
### Threading Architecture
- Two-stage pipeline: Stage-0 (noop) → Stage-1 (connection return)
- **Four-stage commit pipeline**: Sequence → Resolve → Persist → Release
- Lock-free coordination using atomic ring buffer
- **Optimized futex wake**: Only wake on final pipeline stage
- Each request "processed" serially on single thread
- Configurable CPU work performed serially in resolve stage
### Performance Characteristics
**Optimized Pipeline Mode**:
- **Throughput**: 1.3M requests/second
- **Serial CPU time per request**: 550ns (validated with nanobench)
- **Theoretical maximum serial CPU time**: 769ns (1,000,000,000ns ÷ 1,300,000 req/s)
- **Serial efficiency**: 71.5% (550ns ÷ 769ns)
**Health Check Pipeline (/ok endpoint)**:
- **Throughput**: 1.0M requests/second
- **Configurable CPU work**: 650ns (7000 iterations, validated with nanobench)
- **Theoretical maximum CPU time**: 1000ns (1,000,000,000ns ÷ 1,000,000 req/s)
- **CPU work efficiency**: 65% (650ns ÷ 1000ns)
- **Pipeline stages**: Sequence (noop) → Resolve (CPU work) → Persist (response) → Release (cleanup)
- **CPU usage when idle**: 0%
### Key Optimizations
@@ -49,14 +49,20 @@ WeaselDB achieved 1.3M requests/second throughput using a two-stage ThreadPipeli
- **Overall improvement**: 38.9% increase from baseline (396ns → 550ns)
### Request Flow
**Health Check Pipeline** (/ok endpoint):
```
I/O Threads (8) → HttpHandler::on_batch_complete() → ThreadPipeline
I/O Threads (8) → HttpHandler::on_batch_complete() → Commit Pipeline
↑ ↓
| Stage 0: Noop thread
| (550ns serial CPU per request)
| (batch size: 1)
| Stage 0: Sequence (noop)
| ↓
| Stage 1: Connection return
| Stage 1: Resolve (650ns CPU work)
| (spend_cpu_cycles(7000))
| ↓
| Stage 2: Persist (generate response)
| (send "OK" response)
| ↓
| Stage 3: Release (connection return)
| (optimized futex wake)
| ↓
└─────────────────────── Server::release_back_to_server()
@@ -64,7 +70,9 @@ I/O Threads (8) → HttpHandler::on_batch_complete() → ThreadPipeline
## Test Configuration
- Server: test_config.toml with 8 io_threads, 8 epoll_instances
- Load tester: ./load_tester --network-threads 12
- Server: test_benchmark_config.toml with 8 io_threads, 8 epoll_instances
- Configuration: `ok_resolve_iterations = 7000` (650ns CPU work)
- Load tester: targeting /ok endpoint
- Benchmark validation: ./bench_cpu_work 7000
- Build: ninja
- Command: ./weaseldb --config test_config.toml
- Command: ./weaseldb --config test_benchmark_config.toml