# Persistence Thread Design ## Overview The persistence thread receives commit batches from the main processing pipeline and uploads them to S3. It uses a single-threaded design with connection pooling and batching for optimal performance. ## Architecture **Input**: Commits arrive via `ThreadPipeline` interface from upstream processing **Output**: Batched commits uploaded to S3 persistence backend **Transport**: Single-threaded TCP client with connection pooling **Protocol**: Higher layers handle HTTP, authentication, and S3-specific details ## Batching Strategy The persistence thread collects commits into batches using two trigger conditions: 1. **Time Trigger**: `batch_timeout_ms` elapsed since batch collection started 2. **Size Trigger**: `batch_size_threshold` commits collected (can be exceeded by final commit) **Flow Control**: When `max_in_flight_requests` reached, block until responses received. ## Main Processing Loop ### 1. Batch Collection **No In-Flight Requests**: - Use blocking acquire to get first commit batch - Process immediately (no batching delay) **With In-Flight Requests**: - Check flow control: if at `max_in_flight_requests`, block for responses - Collect commits using non-blocking acquire until trigger condition: - Check for available commits (non-blocking) - If `batch_size_threshold` reached → process batch immediately - If below threshold → use `epoll_wait(batch_timeout_ms)` for I/O and timeout - On timeout → process collected commits - If no commits available and no in-flight requests → switch to blocking acquire ### 2. Connection Management - Acquire healthy connection from pool - Create new connections if pool below `target_pool_size` - If no healthy connections available, block until one becomes available - Maintain automatic pool replenishment ### 3. Data Transmission - Write batch data to S3 connection using appropriate protocol - Publish accepted transactions to subscriber system - Track request as in-flight for flow control ### 4. I/O Event Processing - Handle epoll events for all in-flight connections - Monitor connection health via heartbeats - Process incoming responses and detect connection failures ### 5. Response Handling - **Ordered Acknowledgment**: Only acknowledge batch after all prior batches are durable - Release batch via `StageGuard` destructor (publishes to next pipeline stage) - Publish durability events to subscriber system - Return healthy connection to pool ### 6. Failure Handling - Remove failed connection from pool - Retry batch with exponential backoff (up to `max_retry_attempts`) - Backoff delays only affect the specific failing batch - If retries exhausted, abort process or escalate error - Initiate pool replenishment if below target ## Connection Pool **Target Size**: `target_pool_size` connections (recommended: 2x `max_in_flight_requests`) **Replenishment**: Automatic creation when below target **Health Monitoring**: Heartbeat-based connection validation **Sizing Rationale**: 2x multiplier ensures availability during peak load and connection replacement ## Key Design Properties **Batch Ordering**: Batches may be retried out-of-order for performance, but acknowledgment to next pipeline stage maintains strict ordering. **Backpressure**: Retry delays for failing batches create natural backpressure that eventually blocks the persistence thread when in-flight limits are reached. **Graceful Shutdown**: On shutdown signal, drain all in-flight batches to completion before terminating. ## Configuration Parameters | Parameter | Default | Description | |-----------|---------|-------------| | `batch_timeout_ms` | 5ms | Maximum time to wait collecting commits for batching | | `batch_size_threshold` | 1MB | Threshold for triggering batch processing | | `max_in_flight_requests` | 5 | Maximum concurrent requests to persistence backend | | `target_pool_size` | 2x in-flight | Target number of connections to maintain | | `max_retry_attempts` | 3 | Maximum retries for failed batches before aborting | | `retry_base_delay_ms` | 100ms | Base delay for exponential backoff retries | ## Configuration Validation **Required Constraints**: - `batch_size_threshold` > 0 (must process at least one commit per batch) - `max_in_flight_requests` > 0 (must allow at least one concurrent request) - `target_pool_size` >= `max_in_flight_requests` (pool must accommodate all in-flight requests) - `batch_timeout_ms` > 0 (timeout must be positive) - `max_retry_attempts` >= 0 (zero disables retries) - `retry_base_delay_ms` > 0 (delay must be positive if retries enabled) **Performance Recommendations**: - `target_pool_size` <= 2x `max_in_flight_requests` (optimal for performance)