Add mdformat pre-commit hook

2025-09-12 11:24:16 -04:00
parent 9d48caca76
commit bf90b8856a
9 changed files with 286 additions and 120 deletions
--- a/commit_pipeline.md
+++ b/commit_pipeline.md
@@ -15,10 +15,10 @@ HTTP I/O Threads → [Sequence] → [Resolve] → [Persist] → [Release] → HT
 ### Pipeline Flow

 1. **HTTP I/O Threads**: Parse and validate incoming commit requests
-2. **Sequence Stage**: Assign sequential version numbers to commits
-3. **Resolve Stage**: Validate preconditions and check for conflicts
-4. **Persist Stage**: Write commits to durable storage and notify subscribers
-5. **Release Stage**: Return connections to HTTP I/O threads for response handling
+1. **Sequence Stage**: Assign sequential version numbers to commits
+1. **Resolve Stage**: Validate preconditions and check for conflicts
+1. **Persist Stage**: Write commits to durable storage and notify subscribers
+1. **Release Stage**: Return connections to HTTP I/O threads for response handling

 ## Stage Details

@@ -29,21 +29,25 @@ HTTP I/O Threads → [Sequence] → [Resolve] → [Persist] → [Release] → HT
 **Serialization**: **Required** - Must be single-threaded

 **Responsibilities**:
+
 - **For CommitEntry**: Check request_id against banned list, assign sequential version number if not banned, forward to resolve stage
 - **For StatusEntry**: Add request_id to banned list, note current highest assigned version as upper bound, transfer connection to status threadpool
 - Record version assignments for transaction tracking

 **Why Serialization is Required**:
+
 - Version numbers must be strictly sequential without gaps
 - Banned list updates must be atomic with version assignment
 - Status requests must get accurate upper bound on potential commit versions

 **Request ID Banned List**:
+
 - Purpose: Make transactions no longer in-flight and establish version upper bounds for status queries
 - Lifecycle: Grows indefinitely until process restart (leader change)
 - Removal: Only on process restart/leader change, which invalidates all old request IDs

 **Current Implementation**:
+
 ```cpp
 bool HttpHandler::process_sequence_batch(BatchType &batch) {
  for (auto &entry : batch) {
@@ -64,20 +68,24 @@ bool HttpHandler::process_sequence_batch(BatchType &batch) {
 **Serialization**: **Required** - Must be single-threaded

 **Responsibilities**:
+
 - **For CommitEntry**: Check preconditions against in-memory recent writes set, add writes to recent writes set if accepted
 - **For StatusEntry**: N/A (transferred to status threadpool after sequence stage)
 - Mark failed commits with failure information (including which preconditions failed)

 **Why Serialization is Required**:
+
 - Must maintain consistent view of in-memory recent writes set
 - Conflict detection requires atomic evaluation of all preconditions against recent writes
 - Recent writes set updates must be synchronized

 **Transaction State Transitions**:
+
 - **Assigned Version** (from sequence) → **Semi-committed** (resolve accepts) → **Committed** (persist completes)
 - Failed transactions continue through pipeline with failure information for client response

 **Current Implementation**:
+
 ```cpp
 bool HttpHandler::process_resolve_batch(BatchType &batch) {
  // TODO: Implement precondition resolution logic:
@@ -95,28 +103,33 @@ bool HttpHandler::process_resolve_batch(BatchType &batch) {
 **Serialization**: **Required** - Must mark batches durable in order

 **Responsibilities**:
+
 - **For CommitEntry**: Apply operations to persistent storage, update committed version high water mark
 - **For StatusEntry**: N/A (transferred to status threadpool after sequence stage)
 - Generate durability events for `/v1/subscribe` when committed version advances
 - Batch multiple commits for efficient persistence operations

 **Why Serialization is Required**:
+
 - Batches must be marked durable in sequential version order
 - High water mark updates must reflect strict ordering of committed versions
 - Ensures consistency guarantees across all endpoints

 **Committed Version High Water Mark**:
+
 - Global atomic value tracking highest durably committed version
 - Updated after each batch commits: set to highest version in the batch
 - Read by `/v1/version` endpoint using atomic seq_cst reads
 - Enables `/v1/subscribe` durability events when high water mark advances

 **Batching Strategy**:
+
 - Multiple semi-committed transactions can be persisted in a single batch
 - High water mark updated once per batch to highest version in that batch
 - See `persistence.md` for detailed persistence design

 **Current Implementation**:
+
 ```cpp
 bool HttpHandler::process_persist_batch(BatchType &batch) {
  // TODO: Implement actual persistence logic:
@@ -134,16 +147,19 @@ bool HttpHandler::process_persist_batch(BatchType &batch) {
 **Serialization**: Not required - Independent connection handling

 **Responsibilities**:
+
 - Return processed connections to HTTP server for all request types
 - Connection carries response data (success/failure) and status information
 - Trigger response transmission to clients

 **Response Handling**:
+
 - **CommitRequests**: Response generated by persist stage (success with version, or failure with conflicting preconditions)
 - **StatusRequests**: Response generated by separate status lookup logic (not part of pipeline)
 - Failed transactions carry failure information through entire pipeline for proper client response

 **Implementation**:
+
 ```cpp
 bool HttpHandler::process_release_batch(BatchType &batch) {
  // Stage 3: Connection release
@@ -237,6 +253,7 @@ void HttpHandler::on_batch_complete(std::span<std::unique_ptr<Connection>> batch
 ### Backpressure Handling

 The pipeline implements natural backpressure:
+
 - Each stage blocks if downstream stages are full
 - `WaitIfUpstreamIdle` strategy balances latency vs throughput
 - Ring buffer size (`lg_size = 16`) controls maximum queued batches
@@ -344,6 +361,7 @@ private:
 The pipeline processes different types of entries using a variant/union type system instead of `std::unique_ptr<Connection>`:

 ### Pipeline Entry Variants
+
 - **CommitEntry**: Contains `std::unique_ptr<Connection>` with CommitRequest and connection state
 - **StatusEntry**: Contains `std::unique_ptr<Connection>` with StatusRequest (transferred to status threadpool after sequence)
 - **ShutdownEntry**: Signals pipeline shutdown to all stages
@@ -367,6 +385,7 @@ The pipeline processes different types of entries using a variant/union type sys
 #### Request Processing Flow

 1. **HTTP I/O Thread Processing** (`src/http_handler.cpp:210-273`):
+
   ```cpp
   void HttpHandler::handlePostCommit(Connection &conn, HttpConnectionState &state) {
     // Parse and validate anything that doesn't need serialization:
@@ -379,15 +398,17 @@ The pipeline processes different types of entries using a variant/union type sys
   }
   ```

-2. **Pipeline Entry**: Successfully parsed connections enter pipeline as CommitEntry (containing the connection with CommitRequest)
+1. **Pipeline Entry**: Successfully parsed connections enter pipeline as CommitEntry (containing the connection with CommitRequest)
+
+1. **Pipeline Processing**:

-3. **Pipeline Processing**:
   - **Sequence**: Check banned list → assign version (or reject)
   - **Resolve**: Check preconditions against in-memory recent writes → mark semi-committed (or failed with conflict details)
   - **Persist**: Apply operations → mark committed, update high water mark
   - **Release**: Return connection with response data

-4. **Response Generation**: Based on pipeline results
+1. **Response Generation**: Based on pipeline results
+
   - **Success**: `{"status": "committed", "version": N, "leader_id": "...", "request_id": "..."}`
   - **Failure**: `{"status": "not_committed", "conflicts": [...], "version": N, "leader_id": "..."}`

@@ -398,6 +419,7 @@ The pipeline processes different types of entries using a variant/union type sys
 #### Request Processing Flow

 1. **HTTP I/O Thread Processing**:
+
   ```cpp
   void HttpHandler::handleGetStatus(Connection &conn, const HttpConnectionState &state) {
     // TODO: Extract request_id from URL and min_version from query params
@@ -405,21 +427,24 @@ The pipeline processes different types of entries using a variant/union type sys
   }
   ```

-2. **Two-Phase Processing**:
+1. **Two-Phase Processing**:
+
   - **Phase 1 - Sequence Stage**: StatusEntry enters pipeline to add request_id to banned list and get version upper bound
   - **Phase 2 - Status Threadpool**: Connection transferred from sequence stage to dedicated status threadpool for actual status lookup logic

-3. **Status Lookup Logic**: Performed in status threadpool - scan transaction log to determine actual commit status of the now-banned request_id
+1. **Status Lookup Logic**: Performed in status threadpool - scan transaction log to determine actual commit status of the now-banned request_id

 ### `/v1/subscribe` - Real-time Transaction Stream

 **Pipeline Integration**: Consumes events from resolve and persist stages

 #### Event Sources
+
 - **Resolve Stage**: Semi-committed transactions (accepted preconditions) for low-latency streaming
 - **Persist Stage**: Durability events when committed version high water mark advances

 #### Current Implementation
+
 ```cpp
 void HttpHandler::handleGetSubscribe(Connection &conn, const HttpConnectionState &state) {
  // TODO: Parse query parameters (after, durable)
@@ -447,11 +472,12 @@ void HttpHandler::handleGetSubscribe(Connection &conn, const HttpConnectionState
 The pipeline integrates with the HTTP handler at two points:

 1. **Entry**: `on_batch_complete()` feeds connections into sequence stage
-2. **Exit**: Release stage calls `Server::release_back_to_server()`
+1. **Exit**: Release stage calls `Server::release_back_to_server()`

 ### Persistence Layer Integration

 The persist stage interfaces with:
+
 - **S3 Backend**: Batch writes for durability (see `persistence.md`)
 - **Subscriber System**: Real-time change stream notifications
 - **Metrics System**: Transaction throughput and latency tracking
@@ -467,17 +493,17 @@ The persist stage interfaces with:
 ### Potential Enhancements

 1. **Dynamic Thread Counts**: Make resolve and release thread counts configurable
-2. **NUMA Optimization**: Pin pipeline threads to specific CPU cores
-3. **Batch Size Tuning**: Dynamic batch size based on load
-4. **Stage Bypassing**: Skip resolve stage for transactions without preconditions
-5. **Persistence Batching**: Aggregate multiple commits into larger S3 writes
+1. **NUMA Optimization**: Pin pipeline threads to specific CPU cores
+1. **Batch Size Tuning**: Dynamic batch size based on load
+1. **Stage Bypassing**: Skip resolve stage for transactions without preconditions
+1. **Persistence Batching**: Aggregate multiple commits into larger S3 writes

 ### Monitoring and Observability

 1. **Stage Metrics**: Throughput, latency, and queue depth per stage
-2. **Error Tracking**: Error rates and types by stage
-3. **Resource Utilization**: CPU and memory usage per pipeline thread
-4. **Flow Control Events**: Backpressure and stall detection
+1. **Error Tracking**: Error rates and types by stage
+1. **Resource Utilization**: CPU and memory usage per pipeline thread
+1. **Flow Control Events**: Backpressure and stall detection

 ## Implementation Status