Add mdformat pre-commit hook

This commit is contained in:
2025-09-12 11:24:16 -04:00
parent 9d48caca76
commit bf90b8856a
9 changed files with 286 additions and 120 deletions

View File

@@ -16,7 +16,7 @@ The persistence thread receives commit batches from the main processing pipeline
The persistence thread collects commits into batches using two trigger conditions:
1. **Time Trigger**: `batch_timeout_ms` elapsed since batch collection started
2. **Size Trigger**: `batch_size_threshold` commits collected (can be exceeded by final commit)
1. **Size Trigger**: `batch_size_threshold` commits collected (can be exceeded by final commit)
**Flow Control**: When `max_in_flight_requests` reached, block until responses received. Batches in retry backoff count toward the in-flight limit, creating natural backpressure during failures.
@@ -25,10 +25,12 @@ The persistence thread collects commits into batches using two trigger condition
### 1. Batch Collection
**No In-Flight Requests** (no I/O to pump):
- Use blocking acquire to get first commit batch (can afford to wait)
- Process immediately (no batching delay)
**With In-Flight Requests** (I/O to pump in event loop):
- Check flow control: if at `max_in_flight_requests`, block for responses
- Collect commits using non-blocking acquire until trigger condition:
- Check for available commits (non-blocking)
@@ -97,9 +99,10 @@ The persistence thread collects commits into batches using two trigger condition
## Configuration Validation
**Required Constraints**:
- `batch_size_threshold` > 0 (must process at least one commit per batch)
- `max_in_flight_requests` > 0 (must allow at least one concurrent request)
- `max_in_flight_requests` <= 1000 (required for single-call recovery guarantee)
- `max_in_flight_requests` \<= 1000 (required for single-call recovery guarantee)
- `batch_timeout_ms` > 0 (timeout must be positive)
- `max_retry_attempts` >= 0 (zero disables retries)
- `retry_base_delay_ms` > 0 (delay must be positive if retries enabled)
@@ -123,16 +126,19 @@ WeaselDB's batched persistence design enables efficient recovery while maintaini
WeaselDB uses a **sequential batch numbering** scheme with **S3 atomic operations** to provide efficient crash recovery and split-brain prevention without external coordination services.
**Batch Numbering Scheme**:
- Batch numbers start at `2^64 - 1` and count downward: `18446744073709551615, 18446744073709551614, 18446744073709551613, ...`
- Each batch is stored as S3 object `batches/{batch_number:020d}` with zero-padding
- S3 lexicographic ordering on zero-padded numbers returns batches in ascending numerical order (latest batches first)
**Terminology**: Since batch numbers decrease over time, we use numerical ordering:
- "Older" batches = higher numbers (written first in time)
- "Newer" batches = lower numbers (written more recently)
- "Most recent" batches = lowest numbers (most recently written)
**Example**: If batches 100, 99, 98, 97 are written, S3 LIST returns them as:
```
batches/00000000000000000097 (newest, lowest batch number)
batches/00000000000000000098
@@ -142,6 +148,7 @@ batches/00000000000000000100 (oldest, highest batch number)
```
**Leadership and Split-Brain Prevention**:
- New persistence thread instances scan S3 to find the highest (oldest) available batch number
- Each batch write uses `If-None-Match="*"` to atomically claim the sequential batch number
- Only one instance can successfully claim each batch number, preventing split-brain scenarios
@@ -150,28 +157,32 @@ batches/00000000000000000100 (oldest, highest batch number)
**Recovery Scenarios**:
**Clean Shutdown**:
- All in-flight batches are drained to completion before termination
- Durability watermark accurately reflects all durable state
- No recovery required on restart
**Crash Recovery**:
1. **S3 Scan with Bounded Cost**: List S3 objects with prefix `batches/` and limit of 1000 objects
2. **Gap Detection**: Check for missing sequential batch numbers. WeaselDB never puts more than 1000 batches in flight concurrently, so a limit of 1000 is sufficient.
3. **Watermark Reconstruction**: Set durability watermark to the latest consecutive batch (scanning from highest numbers downward, until a gap)
4. **Leadership Transition**: Begin writing batches starting from next available batch number. Skip past any batch numbers already claimed in the durability watermark scan.
1. **Gap Detection**: Check for missing sequential batch numbers. WeaselDB never puts more than 1000 batches in flight concurrently, so a limit of 1000 is sufficient.
1. **Watermark Reconstruction**: Set durability watermark to the latest consecutive batch (scanning from highest numbers downward, until a gap)
1. **Leadership Transition**: Begin writing batches starting from next available batch number. Skip past any batch numbers already claimed in the durability watermark scan.
**Bounded Recovery Guarantee**: Since at most 1000 batches can be in-flight during a crash, any gap in the sequential numbering (indicating the durability watermark) must appear within the first 1000 S3 objects. This is because:
1. At most 1000 batches can be incomplete when crash occurs
2. S3 LIST returns objects in ascending numerical order (most recent batches first due to countdown numbering)
3. The first gap found represents the boundary between durable and potentially incomplete batches
4. S3 LIST operations have a maximum limit of 1000 objects per request
5. Therefore, scanning 1000 objects (the maximum S3 allows in one request) is sufficient to find this boundary
1. S3 LIST returns objects in ascending numerical order (most recent batches first due to countdown numbering)
1. The first gap found represents the boundary between durable and potentially incomplete batches
1. S3 LIST operations have a maximum limit of 1000 objects per request
1. Therefore, scanning 1000 objects (the maximum S3 allows in one request) is sufficient to find this boundary
This ensures **O(1) recovery time** regardless of database size, with at most **one S3 LIST operation** required.
**Recovery Protocol Detail**: Even with exactly 1000 batches in-flight, recovery works correctly:
**Example Scenario**: Batches 2000 down to 1001 (1000 batches) are in-flight when crash occurs
- Previous successful run had written through batch 2001
- Worst case: batch 2000 (oldest in-flight) fails, batches 1999 down to 1001 (newer) all succeed
- S3 LIST(limit=1000) returns: 1001, 1002, ..., 1998, 1999, 2001 (ascending numerical order)