Update /ok to serve dual health check/benchmarking role

2025-09-05 12:39:10 -04:00
parent 761eaa552b
commit e67e4aee17
15 changed files with 265 additions and 53 deletions
--- a/threading_performance_report.md
+++ b/threading_performance_report.md
@@ -2,30 +2,30 @@

 ## Summary

-WeaselDB achieved 1.3M requests/second throughput using a two-stage ThreadPipeline with futex wake optimization, delivering 550ns serial CPU time per request while maintaining 0% CPU usage when idle. Higher serial CPU time means more CPU budget available for serial processing.
+WeaselDB's /ok health check endpoint achieves 1M requests/second with 650ns of configurable CPU work per request through the 4-stage commit pipeline, while maintaining 0% CPU usage when idle. The configurable CPU work serves both as a health check (validating the full pipeline) and as a benchmarking tool for measuring per-request processing capacity.

 ## Performance Metrics

 ### Throughput
- **1.3M requests/second** over unix socket
+- **1.0M requests/second** /ok health check endpoint (4-stage commit pipeline)
 - 8 I/O threads with 8 epoll instances
 - Load tester used 12 network threads
- Max latency: 4ms out of 90M requests
 - **0% CPU usage when idle** (optimized futex wake implementation)

 ### Threading Architecture
- Two-stage pipeline: Stage-0 (noop) → Stage-1 (connection return)
+- **Four-stage commit pipeline**: Sequence → Resolve → Persist → Release
 - Lock-free coordination using atomic ring buffer
 - **Optimized futex wake**: Only wake on final pipeline stage
- Each request "processed" serially on single thread
+- Configurable CPU work performed serially in resolve stage

 ### Performance Characteristics

-**Optimized Pipeline Mode**:
- **Throughput**: 1.3M requests/second
- **Serial CPU time per request**: 550ns (validated with nanobench)
- **Theoretical maximum serial CPU time**: 769ns (1,000,000,000ns ÷ 1,300,000 req/s)
- **Serial efficiency**: 71.5% (550ns ÷ 769ns)
+**Health Check Pipeline (/ok endpoint)**:
+- **Throughput**: 1.0M requests/second
+- **Configurable CPU work**: 650ns (7000 iterations, validated with nanobench)
+- **Theoretical maximum CPU time**: 1000ns (1,000,000,000ns ÷ 1,000,000 req/s)
+- **CPU work efficiency**: 65% (650ns ÷ 1000ns)
+- **Pipeline stages**: Sequence (noop) → Resolve (CPU work) → Persist (response) → Release (cleanup)
 - **CPU usage when idle**: 0%

 ### Key Optimizations
@@ -49,14 +49,20 @@ WeaselDB achieved 1.3M requests/second throughput using a two-stage ThreadPipeli
 - **Overall improvement**: 38.9% increase from baseline (396ns → 550ns)

 ### Request Flow
+
+**Health Check Pipeline** (/ok endpoint):
 ```
-I/O Threads (8) → HttpHandler::on_batch_complete() → ThreadPipeline
+I/O Threads (8) → HttpHandler::on_batch_complete() → Commit Pipeline
    ↑                                                        ↓
-    |                                                 Stage 0: Noop thread
-    |                                                 (550ns serial CPU per request)
-    |                                                 (batch size: 1)
+    |                                                 Stage 0: Sequence (noop)
    |                                                        ↓
-    |                                                 Stage 1: Connection return
+    |                                                 Stage 1: Resolve (650ns CPU work)
+    |                                                 (spend_cpu_cycles(7000))
+    |                                                        ↓
+    |                                                 Stage 2: Persist (generate response)
+    |                                                 (send "OK" response)
+    |                                                        ↓
+    |                                                 Stage 3: Release (connection return)
    |                                                 (optimized futex wake)
    |                                                        ↓
    └─────────────────────── Server::release_back_to_server()
@@ -64,7 +70,9 @@ I/O Threads (8) → HttpHandler::on_batch_complete() → ThreadPipeline

 ## Test Configuration

- Server: test_config.toml with 8 io_threads, 8 epoll_instances
- Load tester: ./load_tester --network-threads 12
+- Server: test_benchmark_config.toml with 8 io_threads, 8 epoll_instances
+- Configuration: `ok_resolve_iterations = 7000` (650ns CPU work)
+- Load tester: targeting /ok endpoint
+- Benchmark validation: ./bench_cpu_work 7000
 - Build: ninja
- Command: ./weaseldb --config test_config.toml
+- Command: ./weaseldb --config test_benchmark_config.toml