Add mdformat pre-commit hook

This commit is contained in:
2025-09-12 11:24:16 -04:00
parent 9d48caca76
commit bf90b8856a
9 changed files with 286 additions and 120 deletions

View File

@@ -15,10 +15,10 @@ HTTP I/O Threads → [Sequence] → [Resolve] → [Persist] → [Release] → HT
### Pipeline Flow
1. **HTTP I/O Threads**: Parse and validate incoming commit requests
2. **Sequence Stage**: Assign sequential version numbers to commits
3. **Resolve Stage**: Validate preconditions and check for conflicts
4. **Persist Stage**: Write commits to durable storage and notify subscribers
5. **Release Stage**: Return connections to HTTP I/O threads for response handling
1. **Sequence Stage**: Assign sequential version numbers to commits
1. **Resolve Stage**: Validate preconditions and check for conflicts
1. **Persist Stage**: Write commits to durable storage and notify subscribers
1. **Release Stage**: Return connections to HTTP I/O threads for response handling
## Stage Details
@@ -29,21 +29,25 @@ HTTP I/O Threads → [Sequence] → [Resolve] → [Persist] → [Release] → HT
**Serialization**: **Required** - Must be single-threaded
**Responsibilities**:
- **For CommitEntry**: Check request_id against banned list, assign sequential version number if not banned, forward to resolve stage
- **For StatusEntry**: Add request_id to banned list, note current highest assigned version as upper bound, transfer connection to status threadpool
- Record version assignments for transaction tracking
**Why Serialization is Required**:
- Version numbers must be strictly sequential without gaps
- Banned list updates must be atomic with version assignment
- Status requests must get accurate upper bound on potential commit versions
**Request ID Banned List**:
- Purpose: Make transactions no longer in-flight and establish version upper bounds for status queries
- Lifecycle: Grows indefinitely until process restart (leader change)
- Removal: Only on process restart/leader change, which invalidates all old request IDs
**Current Implementation**:
```cpp
bool HttpHandler::process_sequence_batch(BatchType &batch) {
for (auto &entry : batch) {
@@ -64,20 +68,24 @@ bool HttpHandler::process_sequence_batch(BatchType &batch) {
**Serialization**: **Required** - Must be single-threaded
**Responsibilities**:
- **For CommitEntry**: Check preconditions against in-memory recent writes set, add writes to recent writes set if accepted
- **For StatusEntry**: N/A (transferred to status threadpool after sequence stage)
- Mark failed commits with failure information (including which preconditions failed)
**Why Serialization is Required**:
- Must maintain consistent view of in-memory recent writes set
- Conflict detection requires atomic evaluation of all preconditions against recent writes
- Recent writes set updates must be synchronized
**Transaction State Transitions**:
- **Assigned Version** (from sequence) → **Semi-committed** (resolve accepts) → **Committed** (persist completes)
- Failed transactions continue through pipeline with failure information for client response
**Current Implementation**:
```cpp
bool HttpHandler::process_resolve_batch(BatchType &batch) {
// TODO: Implement precondition resolution logic:
@@ -95,28 +103,33 @@ bool HttpHandler::process_resolve_batch(BatchType &batch) {
**Serialization**: **Required** - Must mark batches durable in order
**Responsibilities**:
- **For CommitEntry**: Apply operations to persistent storage, update committed version high water mark
- **For StatusEntry**: N/A (transferred to status threadpool after sequence stage)
- Generate durability events for `/v1/subscribe` when committed version advances
- Batch multiple commits for efficient persistence operations
**Why Serialization is Required**:
- Batches must be marked durable in sequential version order
- High water mark updates must reflect strict ordering of committed versions
- Ensures consistency guarantees across all endpoints
**Committed Version High Water Mark**:
- Global atomic value tracking highest durably committed version
- Updated after each batch commits: set to highest version in the batch
- Read by `/v1/version` endpoint using atomic seq_cst reads
- Enables `/v1/subscribe` durability events when high water mark advances
**Batching Strategy**:
- Multiple semi-committed transactions can be persisted in a single batch
- High water mark updated once per batch to highest version in that batch
- See `persistence.md` for detailed persistence design
**Current Implementation**:
```cpp
bool HttpHandler::process_persist_batch(BatchType &batch) {
// TODO: Implement actual persistence logic:
@@ -134,16 +147,19 @@ bool HttpHandler::process_persist_batch(BatchType &batch) {
**Serialization**: Not required - Independent connection handling
**Responsibilities**:
- Return processed connections to HTTP server for all request types
- Connection carries response data (success/failure) and status information
- Trigger response transmission to clients
**Response Handling**:
- **CommitRequests**: Response generated by persist stage (success with version, or failure with conflicting preconditions)
- **StatusRequests**: Response generated by separate status lookup logic (not part of pipeline)
- Failed transactions carry failure information through entire pipeline for proper client response
**Implementation**:
```cpp
bool HttpHandler::process_release_batch(BatchType &batch) {
// Stage 3: Connection release
@@ -237,6 +253,7 @@ void HttpHandler::on_batch_complete(std::span<std::unique_ptr<Connection>> batch
### Backpressure Handling
The pipeline implements natural backpressure:
- Each stage blocks if downstream stages are full
- `WaitIfUpstreamIdle` strategy balances latency vs throughput
- Ring buffer size (`lg_size = 16`) controls maximum queued batches
@@ -344,6 +361,7 @@ private:
The pipeline processes different types of entries using a variant/union type system instead of `std::unique_ptr<Connection>`:
### Pipeline Entry Variants
- **CommitEntry**: Contains `std::unique_ptr<Connection>` with CommitRequest and connection state
- **StatusEntry**: Contains `std::unique_ptr<Connection>` with StatusRequest (transferred to status threadpool after sequence)
- **ShutdownEntry**: Signals pipeline shutdown to all stages
@@ -367,6 +385,7 @@ The pipeline processes different types of entries using a variant/union type sys
#### Request Processing Flow
1. **HTTP I/O Thread Processing** (`src/http_handler.cpp:210-273`):
```cpp
void HttpHandler::handlePostCommit(Connection &conn, HttpConnectionState &state) {
// Parse and validate anything that doesn't need serialization:
@@ -379,15 +398,17 @@ The pipeline processes different types of entries using a variant/union type sys
}
```
2. **Pipeline Entry**: Successfully parsed connections enter pipeline as CommitEntry (containing the connection with CommitRequest)
1. **Pipeline Entry**: Successfully parsed connections enter pipeline as CommitEntry (containing the connection with CommitRequest)
1. **Pipeline Processing**:
3. **Pipeline Processing**:
- **Sequence**: Check banned list → assign version (or reject)
- **Resolve**: Check preconditions against in-memory recent writes → mark semi-committed (or failed with conflict details)
- **Persist**: Apply operations → mark committed, update high water mark
- **Release**: Return connection with response data
4. **Response Generation**: Based on pipeline results
1. **Response Generation**: Based on pipeline results
- **Success**: `{"status": "committed", "version": N, "leader_id": "...", "request_id": "..."}`
- **Failure**: `{"status": "not_committed", "conflicts": [...], "version": N, "leader_id": "..."}`
@@ -398,6 +419,7 @@ The pipeline processes different types of entries using a variant/union type sys
#### Request Processing Flow
1. **HTTP I/O Thread Processing**:
```cpp
void HttpHandler::handleGetStatus(Connection &conn, const HttpConnectionState &state) {
// TODO: Extract request_id from URL and min_version from query params
@@ -405,21 +427,24 @@ The pipeline processes different types of entries using a variant/union type sys
}
```
2. **Two-Phase Processing**:
1. **Two-Phase Processing**:
- **Phase 1 - Sequence Stage**: StatusEntry enters pipeline to add request_id to banned list and get version upper bound
- **Phase 2 - Status Threadpool**: Connection transferred from sequence stage to dedicated status threadpool for actual status lookup logic
3. **Status Lookup Logic**: Performed in status threadpool - scan transaction log to determine actual commit status of the now-banned request_id
1. **Status Lookup Logic**: Performed in status threadpool - scan transaction log to determine actual commit status of the now-banned request_id
### `/v1/subscribe` - Real-time Transaction Stream
**Pipeline Integration**: Consumes events from resolve and persist stages
#### Event Sources
- **Resolve Stage**: Semi-committed transactions (accepted preconditions) for low-latency streaming
- **Persist Stage**: Durability events when committed version high water mark advances
#### Current Implementation
```cpp
void HttpHandler::handleGetSubscribe(Connection &conn, const HttpConnectionState &state) {
// TODO: Parse query parameters (after, durable)
@@ -447,11 +472,12 @@ void HttpHandler::handleGetSubscribe(Connection &conn, const HttpConnectionState
The pipeline integrates with the HTTP handler at two points:
1. **Entry**: `on_batch_complete()` feeds connections into sequence stage
2. **Exit**: Release stage calls `Server::release_back_to_server()`
1. **Exit**: Release stage calls `Server::release_back_to_server()`
### Persistence Layer Integration
The persist stage interfaces with:
- **S3 Backend**: Batch writes for durability (see `persistence.md`)
- **Subscriber System**: Real-time change stream notifications
- **Metrics System**: Transaction throughput and latency tracking
@@ -467,17 +493,17 @@ The persist stage interfaces with:
### Potential Enhancements
1. **Dynamic Thread Counts**: Make resolve and release thread counts configurable
2. **NUMA Optimization**: Pin pipeline threads to specific CPU cores
3. **Batch Size Tuning**: Dynamic batch size based on load
4. **Stage Bypassing**: Skip resolve stage for transactions without preconditions
5. **Persistence Batching**: Aggregate multiple commits into larger S3 writes
1. **NUMA Optimization**: Pin pipeline threads to specific CPU cores
1. **Batch Size Tuning**: Dynamic batch size based on load
1. **Stage Bypassing**: Skip resolve stage for transactions without preconditions
1. **Persistence Batching**: Aggregate multiple commits into larger S3 writes
### Monitoring and Observability
1. **Stage Metrics**: Throughput, latency, and queue depth per stage
2. **Error Tracking**: Error rates and types by stage
3. **Resource Utilization**: CPU and memory usage per pipeline thread
4. **Flow Control Events**: Backpressure and stall detection
1. **Error Tracking**: Error rates and types by stage
1. **Resource Utilization**: CPU and memory usage per pipeline thread
1. **Flow Control Events**: Backpressure and stall detection
## Implementation Status