Compare commits
2 Commits
1a4e8d5761
...
ce794f8a0f
| Author | SHA1 | Date | |
|---|---|---|---|
| ce794f8a0f | |||
| 1e05ee1705 |
1
.gitignore
vendored
1
.gitignore
vendored
@@ -47,3 +47,4 @@ Thumbs.db
|
|||||||
|
|
||||||
.cache
|
.cache
|
||||||
perf.data*
|
perf.data*
|
||||||
|
__pycache__
|
||||||
|
|||||||
@@ -102,6 +102,7 @@ The persistence thread collects commits into batches using two trigger condition
|
|||||||
**Required Constraints**:
|
**Required Constraints**:
|
||||||
- `batch_size_threshold` > 0 (must process at least one commit per batch)
|
- `batch_size_threshold` > 0 (must process at least one commit per batch)
|
||||||
- `max_in_flight_requests` > 0 (must allow at least one concurrent request)
|
- `max_in_flight_requests` > 0 (must allow at least one concurrent request)
|
||||||
|
- `max_in_flight_requests` < 1000 (required for single-call recovery guarantee)
|
||||||
- `target_pool_size` >= `max_in_flight_requests` (pool must accommodate all in-flight requests)
|
- `target_pool_size` >= `max_in_flight_requests` (pool must accommodate all in-flight requests)
|
||||||
- `batch_timeout_ms` > 0 (timeout must be positive)
|
- `batch_timeout_ms` > 0 (timeout must be positive)
|
||||||
- `max_retry_attempts` >= 0 (zero disables retries)
|
- `max_retry_attempts` >= 0 (zero disables retries)
|
||||||
@@ -109,3 +110,47 @@ The persistence thread collects commits into batches using two trigger condition
|
|||||||
|
|
||||||
**Performance Recommendations**:
|
**Performance Recommendations**:
|
||||||
- `target_pool_size` <= 2x `max_in_flight_requests` (optimal for performance)
|
- `target_pool_size` <= 2x `max_in_flight_requests` (optimal for performance)
|
||||||
|
|
||||||
|
## Recovery and Consistency
|
||||||
|
|
||||||
|
### Recovery Model
|
||||||
|
|
||||||
|
WeaselDB's batched persistence design enables efficient recovery while maintaining strict serializable consistency guarantees.
|
||||||
|
|
||||||
|
#### **Batch Ordering and Durability**
|
||||||
|
|
||||||
|
**Ordered Acknowledgment Property**: Batches may be retried out-of-order for performance, but acknowledgment to the next pipeline stage maintains strict ordering. This ensures that if batch N is acknowledged as durable, all batches 0 through N-1 are also guaranteed durable.
|
||||||
|
|
||||||
|
**Durability Watermark**: The system maintains a durable watermark indicating the highest consecutively durable batch ID. This watermark advances only when all preceding batches are confirmed persistent.
|
||||||
|
|
||||||
|
#### **Recovery Protocol**
|
||||||
|
|
||||||
|
WeaselDB uses a **sequential batch numbering** scheme with **S3 atomic operations** to provide efficient crash recovery and split-brain prevention without external coordination services.
|
||||||
|
|
||||||
|
**Batch Numbering Scheme**:
|
||||||
|
- Batch numbers start at `2^64 - 1` and count downward: `18446744073709551615, 18446744073709551614, 18446744073709551613, ...`
|
||||||
|
- Each batch is stored as S3 object `batches/{batch_number:020d}` with zero-padding
|
||||||
|
- S3 lexicographic ordering ensures recent batches (higher numbers) appear first in LIST operations
|
||||||
|
|
||||||
|
**Leadership and Split-Brain Prevention**:
|
||||||
|
- New persistence thread instances scan S3 to find the next available batch number
|
||||||
|
- Each batch write uses `If-None-Match="*"` to atomically claim the sequential batch number
|
||||||
|
- Only one instance can successfully claim each batch number, preventing split-brain scenarios
|
||||||
|
- Batch object content includes `leader_id` to identify which leader wrote each batch
|
||||||
|
|
||||||
|
**Recovery Scenarios**:
|
||||||
|
|
||||||
|
**Clean Shutdown**:
|
||||||
|
- All in-flight batches are drained to completion before termination
|
||||||
|
- Durability watermark accurately reflects all durable state
|
||||||
|
- No recovery required on restart
|
||||||
|
|
||||||
|
**Crash Recovery**:
|
||||||
|
1. **S3 Scan with Bounded Cost**: List S3 objects with prefix `batches/` and limit of 1000 objects
|
||||||
|
2. **Gap Detection**: Check for missing sequential batch numbers. WeaselDB never puts 1000 batches in flight concurrently, so a limit of 1000 is sufficient.
|
||||||
|
3. **Watermark Reconstruction**: Set durability watermark to highest consecutive batch number found
|
||||||
|
4. **Leadership Transition**: Begin writing batches starting from next available batch number. Skip past any batch numbers claimed in the durability watermark scan.
|
||||||
|
|
||||||
|
**Bounded Recovery Guarantee**: Since at most 999 batches can be in-flight during a crash, the durability watermark is guaranteed to be found within the first 1000 objects in S3. This ensures **O(1) recovery time** regardless of database size, with at most **one S3 LIST operation** required.
|
||||||
|
|
||||||
|
**Recovery Performance Limits**: To maintain single-call recovery guarantees, `max_in_flight_requests` is limited to **1000**, matching S3's maximum objects per LIST operation. This ensures recovery a single S3 API call is sufficient for recovery.
|
||||||
|
|||||||
Reference in New Issue
Block a user