Add mdformat pre-commit hook
This commit is contained in:
@@ -25,6 +25,11 @@ repos:
|
||||
- id: black
|
||||
language_version: python3
|
||||
|
||||
- repo: https://github.com/executablebooks/mdformat
|
||||
rev: ff29be1a1ba8029d9375882aa2c812b62112a593 # frozen: 0.7.22
|
||||
hooks:
|
||||
- id: mdformat
|
||||
|
||||
- repo: local
|
||||
hooks:
|
||||
- id: snake-case-enforcement
|
||||
|
||||
44
api.md
44
api.md
@@ -2,7 +2,7 @@
|
||||
|
||||
> **Note:** This is a design for the API of the write-side of a database system where writing and reading are decoupled. The read-side of the system is expected to use the `/v1/subscribe` endpoint to maintain a queryable representation of the key-value data. In other words, reading from this "database" is left as an exercise for the reader. Authentication and authorization are out of scope for this design.
|
||||
|
||||
-----
|
||||
______________________________________________________________________
|
||||
|
||||
## `GET /v1/version`
|
||||
|
||||
@@ -20,16 +20,16 @@ Retrieves the latest known committed version and the current leader.
|
||||
}
|
||||
```
|
||||
|
||||
-----
|
||||
______________________________________________________________________
|
||||
|
||||
## `POST /v1/commit`
|
||||
|
||||
Submits a transaction to be committed. The transaction consists of read preconditions, writes, and deletes.
|
||||
|
||||
* Clients may receive a **`413 Content Too Large`** response if the request exceeds a configurable limit.
|
||||
* A malformed request will result in a **`400 Bad Request`** response.
|
||||
* Keys are sorted by a lexicographical comparison of their raw byte values.
|
||||
* All binary data for keys and values must be encoded using the standard base64 scheme defined in [RFC 4648](https://datatracker.ietf.org/doc/html/rfc4648#section-4), with padding included.
|
||||
- Clients may receive a **`413 Content Too Large`** response if the request exceeds a configurable limit.
|
||||
- A malformed request will result in a **`400 Bad Request`** response.
|
||||
- Keys are sorted by a lexicographical comparison of their raw byte values.
|
||||
- All binary data for keys and values must be encoded using the standard base64 scheme defined in [RFC 4648](https://datatracker.ietf.org/doc/html/rfc4648#section-4), with padding included.
|
||||
|
||||
### Request
|
||||
|
||||
@@ -99,16 +99,16 @@ Submits a transaction to be committed. The transaction consists of read precondi
|
||||
|
||||
1. **`request_id`**: Optional field that can be used with `/v1/status` to determine the outcome if no reply is received. If omitted, a UUID will be automatically generated by the server, and clients will not be able to determine commit status if there's no response. When provided, the request_id must meet the minimum length requirement (configurable, default 20 characters) to ensure sufficient entropy for collision avoidance. This ID must not be reused in a commit request. For idempotency, if a response is not received, the client must use `/v1/status` to determine the request's outcome. The original `request_id` should not be reused for a new commit attempt; instead, a retry should be sent with a new `request_id`. The alternative design would require the leader to store every request ID in memory.
|
||||
|
||||
2. **`preconditions` (Guarantees and Usage)**: The condition is satisfied if the server verifies that the range has not changed since the specified version. Clients can achieve serializable isolation by including all reads that influenced their writes. By default, clients should assume that any read they perform influences their writes. Omitting reads is an expert-level optimization and should generally be avoided.
|
||||
1. **`preconditions` (Guarantees and Usage)**: The condition is satisfied if the server verifies that the range has not changed since the specified version. Clients can achieve serializable isolation by including all reads that influenced their writes. By default, clients should assume that any read they perform influences their writes. Omitting reads is an expert-level optimization and should generally be avoided.
|
||||
|
||||
3. **`preconditions` (False Positives & Leader Changes)**: Precondition checks are conservative and best-effort; it's possible to reject a transaction where the range hasn't actually changed. In all such cases, clients should retry with a more recent read version. Two examples of false positives are:
|
||||
1. **`preconditions` (False Positives & Leader Changes)**: Precondition checks are conservative and best-effort; it's possible to reject a transaction where the range hasn't actually changed. In all such cases, clients should retry with a more recent read version. Two examples of false positives are:
|
||||
|
||||
* **Implementation Detail:** The leader may use partitioned conflict history for performance. A conflict in one partition (even from a transaction that later aborts) can cause a rejection.
|
||||
* **Leader Changes:** A version is only valid within the term of the leader that issued it. Since conflict history is stored in memory, a leadership change invalidates all previously issued read versions. Any transaction using such a version will be rejected.
|
||||
- **Implementation Detail:** The leader may use partitioned conflict history for performance. A conflict in one partition (even from a transaction that later aborts) can cause a rejection.
|
||||
- **Leader Changes:** A version is only valid within the term of the leader that issued it. Since conflict history is stored in memory, a leadership change invalidates all previously issued read versions. Any transaction using such a version will be rejected.
|
||||
|
||||
The versions in the precondition checks need not be the same.
|
||||
|
||||
-----
|
||||
______________________________________________________________________
|
||||
|
||||
## `GET /v1/status`
|
||||
|
||||
@@ -125,7 +125,7 @@ Gets the status of a previous commit request by its `request_id`.
|
||||
| `request_id` | string | Yes | The `request_id` from the original `/v1/commit` request. |
|
||||
| `min_version` | integer | Yes | An optimization that constrains the log scan. This value should be the latest version the client knew to be committed *before* sending the original request. |
|
||||
|
||||
> **Warning\!** If the provided `min_version` is later than the transaction's actual commit version, the server might not find the record in the scanned portion of the log. This can result in an `id_not_found` status, even if the transaction actually committed.
|
||||
> **Warning!** If the provided `min_version` is later than the transaction's actual commit version, the server might not find the record in the scanned portion of the log. This can result in an `id_not_found` status, even if the transaction actually committed.
|
||||
|
||||
### Response
|
||||
|
||||
@@ -144,7 +144,7 @@ A response from this endpoint guarantees the original request is no longer in fl
|
||||
|
||||
> **Note on `log_truncated` status:** This indicates the `request_id` log has been truncated after `min_version`, making it impossible to determine the original request's outcome. There is no way to avoid this without storing an arbitrarily large number of request IDs. Clients must treat this as an indeterminate outcome. Retrying the transaction is unsafe unless the client has an external method to verify the original transaction's status. This error should be propagated to the caller. `request_id`s are retained for a configurable minimum time and number of versions so this should be extremely rare.
|
||||
|
||||
-----
|
||||
______________________________________________________________________
|
||||
|
||||
## `GET /v1/subscribe`
|
||||
|
||||
@@ -192,9 +192,9 @@ data: {"committed_version":123456,"leader_id":"abcdefg"}
|
||||
|
||||
1. **Data Guarantees**: When `durable=false`, this endpoint streams *accepted*, but not necessarily *durable/committed*, transactions. *Accepted* transactions will eventually commit unless the current leader changes.
|
||||
|
||||
2. **Leader Changes & Reconnection**: When `durable=false`, if the leader changes, clients **must** discard all of that leader's `transaction` events received after their last-seen `checkpoint` event. They must then manually reconnect (as the server connection will likely be terminated) and restart the subscription by setting the `after` query parameter to the version specified in that last-known checkpoint. Clients should implement a randomized exponential backoff strategy (backoff with jitter) when reconnecting.
|
||||
1. **Leader Changes & Reconnection**: When `durable=false`, if the leader changes, clients **must** discard all of that leader's `transaction` events received after their last-seen `checkpoint` event. They must then manually reconnect (as the server connection will likely be terminated) and restart the subscription by setting the `after` query parameter to the version specified in that last-known checkpoint. Clients should implement a randomized exponential backoff strategy (backoff with jitter) when reconnecting.
|
||||
|
||||
3. **Connection Handling & Errors**: The server may periodically send `keepalive` comments to prevent idle timeouts on network proxies. The server will buffer unconsumed data up to a configurable limit; if the client falls too far behind, the connection will be closed. If the `after` version has been truncated from the log, this endpoint will return a standard `410 Gone` HTTP error instead of an event stream.
|
||||
1. **Connection Handling & Errors**: The server may periodically send `keepalive` comments to prevent idle timeouts on network proxies. The server will buffer unconsumed data up to a configurable limit; if the client falls too far behind, the connection will be closed. If the `after` version has been truncated from the log, this endpoint will return a standard `410 Gone` HTTP error instead of an event stream.
|
||||
|
||||
## `PUT /v1/retention/<policy_id>`
|
||||
|
||||
@@ -211,10 +211,10 @@ Creates or updates a retention policy.
|
||||
|
||||
### Response
|
||||
|
||||
* `201 Created` if the policy was created.
|
||||
* `200 OK` if the policy was updated.
|
||||
- `201 Created` if the policy was created.
|
||||
- `200 OK` if the policy was updated.
|
||||
|
||||
-----
|
||||
______________________________________________________________________
|
||||
|
||||
## `GET /v1/retention/<policy_id>`
|
||||
|
||||
@@ -228,7 +228,7 @@ Retrieves a retention policy by ID.
|
||||
}
|
||||
```
|
||||
|
||||
-----
|
||||
______________________________________________________________________
|
||||
|
||||
## `GET /v1/retention/`
|
||||
|
||||
@@ -245,7 +245,7 @@ Retrieves all retention policies.
|
||||
]
|
||||
```
|
||||
|
||||
-----
|
||||
______________________________________________________________________
|
||||
|
||||
## `DELETE /v1/retention/<policy_id>`
|
||||
|
||||
@@ -255,7 +255,7 @@ Removes a retention policy, which may allow the log to be truncated.
|
||||
|
||||
`204 No Content`
|
||||
|
||||
-----
|
||||
______________________________________________________________________
|
||||
|
||||
## `GET /ok`
|
||||
|
||||
@@ -265,7 +265,7 @@ Simple health check endpoint.
|
||||
|
||||
Returns `200 OK` with minimal content for basic health monitoring.
|
||||
|
||||
-----
|
||||
______________________________________________________________________
|
||||
|
||||
## `GET /metrics`
|
||||
|
||||
|
||||
@@ -15,10 +15,10 @@ HTTP I/O Threads → [Sequence] → [Resolve] → [Persist] → [Release] → HT
|
||||
### Pipeline Flow
|
||||
|
||||
1. **HTTP I/O Threads**: Parse and validate incoming commit requests
|
||||
2. **Sequence Stage**: Assign sequential version numbers to commits
|
||||
3. **Resolve Stage**: Validate preconditions and check for conflicts
|
||||
4. **Persist Stage**: Write commits to durable storage and notify subscribers
|
||||
5. **Release Stage**: Return connections to HTTP I/O threads for response handling
|
||||
1. **Sequence Stage**: Assign sequential version numbers to commits
|
||||
1. **Resolve Stage**: Validate preconditions and check for conflicts
|
||||
1. **Persist Stage**: Write commits to durable storage and notify subscribers
|
||||
1. **Release Stage**: Return connections to HTTP I/O threads for response handling
|
||||
|
||||
## Stage Details
|
||||
|
||||
@@ -29,21 +29,25 @@ HTTP I/O Threads → [Sequence] → [Resolve] → [Persist] → [Release] → HT
|
||||
**Serialization**: **Required** - Must be single-threaded
|
||||
|
||||
**Responsibilities**:
|
||||
|
||||
- **For CommitEntry**: Check request_id against banned list, assign sequential version number if not banned, forward to resolve stage
|
||||
- **For StatusEntry**: Add request_id to banned list, note current highest assigned version as upper bound, transfer connection to status threadpool
|
||||
- Record version assignments for transaction tracking
|
||||
|
||||
**Why Serialization is Required**:
|
||||
|
||||
- Version numbers must be strictly sequential without gaps
|
||||
- Banned list updates must be atomic with version assignment
|
||||
- Status requests must get accurate upper bound on potential commit versions
|
||||
|
||||
**Request ID Banned List**:
|
||||
|
||||
- Purpose: Make transactions no longer in-flight and establish version upper bounds for status queries
|
||||
- Lifecycle: Grows indefinitely until process restart (leader change)
|
||||
- Removal: Only on process restart/leader change, which invalidates all old request IDs
|
||||
|
||||
**Current Implementation**:
|
||||
|
||||
```cpp
|
||||
bool HttpHandler::process_sequence_batch(BatchType &batch) {
|
||||
for (auto &entry : batch) {
|
||||
@@ -64,20 +68,24 @@ bool HttpHandler::process_sequence_batch(BatchType &batch) {
|
||||
**Serialization**: **Required** - Must be single-threaded
|
||||
|
||||
**Responsibilities**:
|
||||
|
||||
- **For CommitEntry**: Check preconditions against in-memory recent writes set, add writes to recent writes set if accepted
|
||||
- **For StatusEntry**: N/A (transferred to status threadpool after sequence stage)
|
||||
- Mark failed commits with failure information (including which preconditions failed)
|
||||
|
||||
**Why Serialization is Required**:
|
||||
|
||||
- Must maintain consistent view of in-memory recent writes set
|
||||
- Conflict detection requires atomic evaluation of all preconditions against recent writes
|
||||
- Recent writes set updates must be synchronized
|
||||
|
||||
**Transaction State Transitions**:
|
||||
|
||||
- **Assigned Version** (from sequence) → **Semi-committed** (resolve accepts) → **Committed** (persist completes)
|
||||
- Failed transactions continue through pipeline with failure information for client response
|
||||
|
||||
**Current Implementation**:
|
||||
|
||||
```cpp
|
||||
bool HttpHandler::process_resolve_batch(BatchType &batch) {
|
||||
// TODO: Implement precondition resolution logic:
|
||||
@@ -95,28 +103,33 @@ bool HttpHandler::process_resolve_batch(BatchType &batch) {
|
||||
**Serialization**: **Required** - Must mark batches durable in order
|
||||
|
||||
**Responsibilities**:
|
||||
|
||||
- **For CommitEntry**: Apply operations to persistent storage, update committed version high water mark
|
||||
- **For StatusEntry**: N/A (transferred to status threadpool after sequence stage)
|
||||
- Generate durability events for `/v1/subscribe` when committed version advances
|
||||
- Batch multiple commits for efficient persistence operations
|
||||
|
||||
**Why Serialization is Required**:
|
||||
|
||||
- Batches must be marked durable in sequential version order
|
||||
- High water mark updates must reflect strict ordering of committed versions
|
||||
- Ensures consistency guarantees across all endpoints
|
||||
|
||||
**Committed Version High Water Mark**:
|
||||
|
||||
- Global atomic value tracking highest durably committed version
|
||||
- Updated after each batch commits: set to highest version in the batch
|
||||
- Read by `/v1/version` endpoint using atomic seq_cst reads
|
||||
- Enables `/v1/subscribe` durability events when high water mark advances
|
||||
|
||||
**Batching Strategy**:
|
||||
|
||||
- Multiple semi-committed transactions can be persisted in a single batch
|
||||
- High water mark updated once per batch to highest version in that batch
|
||||
- See `persistence.md` for detailed persistence design
|
||||
|
||||
**Current Implementation**:
|
||||
|
||||
```cpp
|
||||
bool HttpHandler::process_persist_batch(BatchType &batch) {
|
||||
// TODO: Implement actual persistence logic:
|
||||
@@ -134,16 +147,19 @@ bool HttpHandler::process_persist_batch(BatchType &batch) {
|
||||
**Serialization**: Not required - Independent connection handling
|
||||
|
||||
**Responsibilities**:
|
||||
|
||||
- Return processed connections to HTTP server for all request types
|
||||
- Connection carries response data (success/failure) and status information
|
||||
- Trigger response transmission to clients
|
||||
|
||||
**Response Handling**:
|
||||
|
||||
- **CommitRequests**: Response generated by persist stage (success with version, or failure with conflicting preconditions)
|
||||
- **StatusRequests**: Response generated by separate status lookup logic (not part of pipeline)
|
||||
- Failed transactions carry failure information through entire pipeline for proper client response
|
||||
|
||||
**Implementation**:
|
||||
|
||||
```cpp
|
||||
bool HttpHandler::process_release_batch(BatchType &batch) {
|
||||
// Stage 3: Connection release
|
||||
@@ -237,6 +253,7 @@ void HttpHandler::on_batch_complete(std::span<std::unique_ptr<Connection>> batch
|
||||
### Backpressure Handling
|
||||
|
||||
The pipeline implements natural backpressure:
|
||||
|
||||
- Each stage blocks if downstream stages are full
|
||||
- `WaitIfUpstreamIdle` strategy balances latency vs throughput
|
||||
- Ring buffer size (`lg_size = 16`) controls maximum queued batches
|
||||
@@ -344,6 +361,7 @@ private:
|
||||
The pipeline processes different types of entries using a variant/union type system instead of `std::unique_ptr<Connection>`:
|
||||
|
||||
### Pipeline Entry Variants
|
||||
|
||||
- **CommitEntry**: Contains `std::unique_ptr<Connection>` with CommitRequest and connection state
|
||||
- **StatusEntry**: Contains `std::unique_ptr<Connection>` with StatusRequest (transferred to status threadpool after sequence)
|
||||
- **ShutdownEntry**: Signals pipeline shutdown to all stages
|
||||
@@ -367,6 +385,7 @@ The pipeline processes different types of entries using a variant/union type sys
|
||||
#### Request Processing Flow
|
||||
|
||||
1. **HTTP I/O Thread Processing** (`src/http_handler.cpp:210-273`):
|
||||
|
||||
```cpp
|
||||
void HttpHandler::handlePostCommit(Connection &conn, HttpConnectionState &state) {
|
||||
// Parse and validate anything that doesn't need serialization:
|
||||
@@ -379,15 +398,17 @@ The pipeline processes different types of entries using a variant/union type sys
|
||||
}
|
||||
```
|
||||
|
||||
2. **Pipeline Entry**: Successfully parsed connections enter pipeline as CommitEntry (containing the connection with CommitRequest)
|
||||
1. **Pipeline Entry**: Successfully parsed connections enter pipeline as CommitEntry (containing the connection with CommitRequest)
|
||||
|
||||
1. **Pipeline Processing**:
|
||||
|
||||
3. **Pipeline Processing**:
|
||||
- **Sequence**: Check banned list → assign version (or reject)
|
||||
- **Resolve**: Check preconditions against in-memory recent writes → mark semi-committed (or failed with conflict details)
|
||||
- **Persist**: Apply operations → mark committed, update high water mark
|
||||
- **Release**: Return connection with response data
|
||||
|
||||
4. **Response Generation**: Based on pipeline results
|
||||
1. **Response Generation**: Based on pipeline results
|
||||
|
||||
- **Success**: `{"status": "committed", "version": N, "leader_id": "...", "request_id": "..."}`
|
||||
- **Failure**: `{"status": "not_committed", "conflicts": [...], "version": N, "leader_id": "..."}`
|
||||
|
||||
@@ -398,6 +419,7 @@ The pipeline processes different types of entries using a variant/union type sys
|
||||
#### Request Processing Flow
|
||||
|
||||
1. **HTTP I/O Thread Processing**:
|
||||
|
||||
```cpp
|
||||
void HttpHandler::handleGetStatus(Connection &conn, const HttpConnectionState &state) {
|
||||
// TODO: Extract request_id from URL and min_version from query params
|
||||
@@ -405,21 +427,24 @@ The pipeline processes different types of entries using a variant/union type sys
|
||||
}
|
||||
```
|
||||
|
||||
2. **Two-Phase Processing**:
|
||||
1. **Two-Phase Processing**:
|
||||
|
||||
- **Phase 1 - Sequence Stage**: StatusEntry enters pipeline to add request_id to banned list and get version upper bound
|
||||
- **Phase 2 - Status Threadpool**: Connection transferred from sequence stage to dedicated status threadpool for actual status lookup logic
|
||||
|
||||
3. **Status Lookup Logic**: Performed in status threadpool - scan transaction log to determine actual commit status of the now-banned request_id
|
||||
1. **Status Lookup Logic**: Performed in status threadpool - scan transaction log to determine actual commit status of the now-banned request_id
|
||||
|
||||
### `/v1/subscribe` - Real-time Transaction Stream
|
||||
|
||||
**Pipeline Integration**: Consumes events from resolve and persist stages
|
||||
|
||||
#### Event Sources
|
||||
|
||||
- **Resolve Stage**: Semi-committed transactions (accepted preconditions) for low-latency streaming
|
||||
- **Persist Stage**: Durability events when committed version high water mark advances
|
||||
|
||||
#### Current Implementation
|
||||
|
||||
```cpp
|
||||
void HttpHandler::handleGetSubscribe(Connection &conn, const HttpConnectionState &state) {
|
||||
// TODO: Parse query parameters (after, durable)
|
||||
@@ -447,11 +472,12 @@ void HttpHandler::handleGetSubscribe(Connection &conn, const HttpConnectionState
|
||||
The pipeline integrates with the HTTP handler at two points:
|
||||
|
||||
1. **Entry**: `on_batch_complete()` feeds connections into sequence stage
|
||||
2. **Exit**: Release stage calls `Server::release_back_to_server()`
|
||||
1. **Exit**: Release stage calls `Server::release_back_to_server()`
|
||||
|
||||
### Persistence Layer Integration
|
||||
|
||||
The persist stage interfaces with:
|
||||
|
||||
- **S3 Backend**: Batch writes for durability (see `persistence.md`)
|
||||
- **Subscriber System**: Real-time change stream notifications
|
||||
- **Metrics System**: Transaction throughput and latency tracking
|
||||
@@ -467,17 +493,17 @@ The persist stage interfaces with:
|
||||
### Potential Enhancements
|
||||
|
||||
1. **Dynamic Thread Counts**: Make resolve and release thread counts configurable
|
||||
2. **NUMA Optimization**: Pin pipeline threads to specific CPU cores
|
||||
3. **Batch Size Tuning**: Dynamic batch size based on load
|
||||
4. **Stage Bypassing**: Skip resolve stage for transactions without preconditions
|
||||
5. **Persistence Batching**: Aggregate multiple commits into larger S3 writes
|
||||
1. **NUMA Optimization**: Pin pipeline threads to specific CPU cores
|
||||
1. **Batch Size Tuning**: Dynamic batch size based on load
|
||||
1. **Stage Bypassing**: Skip resolve stage for transactions without preconditions
|
||||
1. **Persistence Batching**: Aggregate multiple commits into larger S3 writes
|
||||
|
||||
### Monitoring and Observability
|
||||
|
||||
1. **Stage Metrics**: Throughput, latency, and queue depth per stage
|
||||
2. **Error Tracking**: Error rates and types by stage
|
||||
3. **Resource Utilization**: CPU and memory usage per pipeline thread
|
||||
4. **Flow Control Events**: Backpressure and stall detection
|
||||
1. **Error Tracking**: Error rates and types by stage
|
||||
1. **Resource Utilization**: CPU and memory usage per pipeline thread
|
||||
1. **Flow Control Events**: Backpressure and stall detection
|
||||
|
||||
## Implementation Status
|
||||
|
||||
|
||||
10
config.md
10
config.md
@@ -101,18 +101,22 @@ WeaselDB uses the `toml11` library for configuration parsing with robust error h
|
||||
These configuration parameters directly affect server and API behavior:
|
||||
|
||||
**Server Performance:**
|
||||
|
||||
- **`io_threads`**: Controls parallelism for both accepting new connections and I/O processing. Should typically match CPU core count for optimal performance
|
||||
- **`event_batch_size`**: Larger batches reduce syscall overhead but may increase latency under light load
|
||||
- **`max_connections`**: Prevents resource exhaustion by limiting concurrent connections
|
||||
|
||||
**Request Handling:**
|
||||
|
||||
- **`max_request_size_bytes`**: Determines when `/v1/commit` returns `413 Content Too Large`
|
||||
- **`min_request_id_length`**: Validates `request_id` fields in `/v1/commit` requests for sufficient entropy
|
||||
|
||||
**Request ID Management:**
|
||||
|
||||
- **`request_id_retention_*`**: Affects availability of data for `/v1/status` queries and likelihood of `log_truncated` responses
|
||||
|
||||
**Subscription Streaming:**
|
||||
|
||||
- **`max_buffer_size_bytes`**: Controls when `/v1/subscribe` connections are terminated due to slow consumption
|
||||
- **`keepalive_interval_seconds`**: Frequency of keepalive comments in `/v1/subscribe` streams
|
||||
|
||||
@@ -121,6 +125,7 @@ These configuration parameters directly affect server and API behavior:
|
||||
The configuration system includes comprehensive validation with specific bounds checking:
|
||||
|
||||
### Server Configuration Limits
|
||||
|
||||
- **`port`**: Must be between 1 and 65535
|
||||
- **`max_request_size_bytes`**: Must be > 0 and ≤ 100MB
|
||||
- **`io_threads`**: Must be between 1 and 1000
|
||||
@@ -128,26 +133,31 @@ The configuration system includes comprehensive validation with specific bounds
|
||||
- **`max_connections`**: Must be between 0 and 100000 (0 = unlimited)
|
||||
|
||||
### Commit Configuration Limits
|
||||
|
||||
- **`min_request_id_length`**: Must be between 8 and 256 characters
|
||||
- **`request_id_retention_hours`**: Must be between 1 and 8760 hours (1 year)
|
||||
- **`request_id_retention_versions`**: Must be > 0
|
||||
|
||||
### Subscription Configuration Limits
|
||||
|
||||
- **`max_buffer_size_bytes`**: Must be > 0 and ≤ 1GB
|
||||
- **`keepalive_interval_seconds`**: Must be between 1 and 3600 seconds (1 hour)
|
||||
|
||||
### Cross-Validation
|
||||
|
||||
- Warns if `max_request_size_bytes` > `max_buffer_size_bytes` (potential buffering issues)
|
||||
|
||||
## Configuration Management
|
||||
|
||||
### Code Integration
|
||||
|
||||
- **Configuration Structure**: Defined in `src/config.hpp` with structured types
|
||||
- **Parser Implementation**: Located in `src/config.cpp` using template-based parsing
|
||||
- **Default Values**: Embedded as struct defaults for compile-time initialization
|
||||
- **Runtime Usage**: Configuration passed to server components during initialization
|
||||
|
||||
### Development Guidelines
|
||||
|
||||
- **New Parameters**: Add to appropriate struct in `src/config.hpp`
|
||||
- **Validation**: Include bounds checking in `ConfigParser::validate_config()`
|
||||
- **Documentation**: Update this file when adding new configuration options
|
||||
|
||||
122
design.md
122
design.md
@@ -3,15 +3,15 @@
|
||||
## Table of Contents
|
||||
|
||||
1. [Project Overview](#project-overview)
|
||||
2. [Quick Start](#quick-start)
|
||||
3. [Architecture](#architecture)
|
||||
4. [Development Guidelines](#development-guidelines)
|
||||
5. [Common Patterns](#common-patterns)
|
||||
6. [Reference](#reference)
|
||||
1. [Quick Start](#quick-start)
|
||||
1. [Architecture](#architecture)
|
||||
1. [Development Guidelines](#development-guidelines)
|
||||
1. [Common Patterns](#common-patterns)
|
||||
1. [Reference](#reference)
|
||||
|
||||
**IMPORTANT:** Read [style.md](style.md) first - contains mandatory C++ coding standards, threading rules, and testing guidelines that must be followed for all code changes.
|
||||
|
||||
---
|
||||
______________________________________________________________________
|
||||
|
||||
## Project Overview
|
||||
|
||||
@@ -26,7 +26,7 @@ WeaselDB is a high-performance write-side database component designed for system
|
||||
- **Optimized memory management** with arena allocation and efficient copying
|
||||
- **Factory pattern safety** ensuring correct object lifecycle management
|
||||
|
||||
---
|
||||
______________________________________________________________________
|
||||
|
||||
## Quick Start
|
||||
|
||||
@@ -43,11 +43,13 @@ ninja
|
||||
### Testing & Development
|
||||
|
||||
**Run all tests:**
|
||||
|
||||
```bash
|
||||
ninja test # or ctest
|
||||
```
|
||||
|
||||
**Individual targets:**
|
||||
|
||||
- `./test_arena` - Arena allocator unit tests
|
||||
- `./test_commit_request` - JSON parsing and validation tests
|
||||
- `./test_http_handler` - HTTP protocol handling tests
|
||||
@@ -56,6 +58,7 @@ ninja test # or ctest
|
||||
- `./test_server_connection_return` - Connection lifecycle tests
|
||||
|
||||
**Benchmarking:**
|
||||
|
||||
- `./bench_arena` - Memory allocation performance
|
||||
- `./bench_commit_request` - JSON parsing performance
|
||||
- `./bench_parser_comparison` - Compare vs nlohmann::json and RapidJSON
|
||||
@@ -64,18 +67,22 @@ ninja test # or ctest
|
||||
- `./bench_format_comparison` - String formatting performance
|
||||
|
||||
**Debug tools:**
|
||||
|
||||
- `./debug_arena` - Analyze arena allocator behavior
|
||||
|
||||
**Load Testing:**
|
||||
|
||||
- `./load_tester` - A tool to generate load against the server for performance and stability analysis.
|
||||
|
||||
### Dependencies
|
||||
|
||||
**System requirements:**
|
||||
|
||||
- **weaseljson** - Must be installed system-wide (high-performance JSON parser)
|
||||
- **gperf** - System requirement for perfect hash generation
|
||||
|
||||
**Auto-fetched:**
|
||||
|
||||
- **simdutf** - SIMD base64 encoding/decoding
|
||||
- **toml11** - TOML configuration parsing
|
||||
- **doctest** - Testing framework
|
||||
@@ -84,7 +91,7 @@ ninja test # or ctest
|
||||
- **RapidJSON** - High-performance JSON library (used in benchmarks)
|
||||
- **llhttp** - Fast HTTP parser
|
||||
|
||||
---
|
||||
______________________________________________________________________
|
||||
|
||||
## Architecture
|
||||
|
||||
@@ -106,6 +113,7 @@ Ultra-fast memory allocator optimized for request/response patterns:
|
||||
#### **Networking Layer**
|
||||
|
||||
**Server** (`src/server.{hpp,cpp}`):
|
||||
|
||||
- **High-performance multi-threaded networking** using multiple epoll instances with unified I/O thread pool
|
||||
- **Configurable epoll instances** to eliminate kernel-level epoll_ctl contention (default: 2, max: io_threads)
|
||||
- **Round-robin thread-to-epoll assignment** distributes I/O threads across epoll instances
|
||||
@@ -117,6 +125,7 @@ Ultra-fast memory allocator optimized for request/response patterns:
|
||||
- **EPOLL_EXCLUSIVE** on listen socket across all epoll instances prevents thundering herd
|
||||
|
||||
**Connection** (`src/connection.{hpp,cpp}`):
|
||||
|
||||
- **Efficient per-connection state management** with arena-based memory allocation
|
||||
- **Safe ownership transfer** between server threads and protocol handlers
|
||||
- **Automatic cleanup** on connection closure or server shutdown
|
||||
@@ -124,6 +133,7 @@ Ultra-fast memory allocator optimized for request/response patterns:
|
||||
- **Protocol-specific data:** `user_data` `void*` for custom handler data
|
||||
|
||||
**ConnectionHandler Interface** (`src/connection_handler.hpp`):
|
||||
|
||||
- **Abstract protocol interface** decoupling networking from application logic
|
||||
- **Ownership transfer support** allowing handlers to take connections for async processing
|
||||
- **Streaming data processing** with partial message handling
|
||||
@@ -141,6 +151,7 @@ A high-performance, multi-stage, lock-free pipeline for inter-thread communicati
|
||||
#### **Parsing Layer**
|
||||
|
||||
**JSON Commit Request Parser** (`src/json_commit_request_parser.{hpp,cpp}`):
|
||||
|
||||
- **High-performance JSON parser** using `weaseljson` library
|
||||
- **Streaming parser support** for incremental parsing of network data
|
||||
- **gperf-optimized token recognition** for fast JSON key parsing
|
||||
@@ -150,6 +161,7 @@ A high-performance, multi-stage, lock-free pipeline for inter-thread communicati
|
||||
- **Zero hash collisions** for known JSON tokens eliminates branching
|
||||
|
||||
**Parser Interface** (`src/commit_request_parser.hpp`):
|
||||
|
||||
- **Abstract base class** for commit request parsers
|
||||
- **Format-agnostic parsing interface** supporting multiple serialization formats
|
||||
- **Streaming and one-shot parsing modes**
|
||||
@@ -158,6 +170,7 @@ A high-performance, multi-stage, lock-free pipeline for inter-thread communicati
|
||||
#### **Data Model**
|
||||
|
||||
**Commit Request Data Model** (`src/commit_request.hpp`):
|
||||
|
||||
- **Format-agnostic data structure** for representing transactional commits
|
||||
- **Arena-backed string storage** with efficient memory management
|
||||
- **Move-only semantics** for optimal performance
|
||||
@@ -167,18 +180,21 @@ A high-performance, multi-stage, lock-free pipeline for inter-thread communicati
|
||||
#### **Metrics System** (`src/metric.{hpp,cpp}`)
|
||||
|
||||
**High-Performance Metrics Implementation:**
|
||||
|
||||
- **Thread-local counters/histograms** with single writer for performance
|
||||
- **Global gauges** with lock-free atomic CAS operations for multi-writer scenarios
|
||||
- **SIMD-optimized histogram bucket updates** using AVX instructions for high throughput
|
||||
- **Arena allocator integration** for efficient memory management during rendering
|
||||
|
||||
**Threading Model:**
|
||||
|
||||
- **Counters**: Per-thread storage, single writer, atomic write in `Counter::inc()`, atomic read in render thread
|
||||
- **Histograms**: Per-thread storage, single writer, per-histogram mutex serializes all access (observe and render)
|
||||
- **Gauges**: Lock-free atomic operations using `std::bit_cast` for double precision
|
||||
- **Thread cleanup**: Automatic accumulation of thread-local state into global state on destruction
|
||||
|
||||
**Prometheus Compatibility:**
|
||||
|
||||
- **Standard metric types** with proper label handling and validation
|
||||
- **Bucket generation helpers** for linear/exponential histogram distributions
|
||||
- **Callback-based metrics** for dynamic values
|
||||
@@ -187,6 +203,7 @@ A high-performance, multi-stage, lock-free pipeline for inter-thread communicati
|
||||
#### **Configuration & Optimization**
|
||||
|
||||
**Configuration System** (`src/config.{hpp,cpp}`):
|
||||
|
||||
- **TOML-based configuration** using `toml11` library
|
||||
- **Structured configuration** with server, commit, and subscription sections
|
||||
- **Default fallback values** for all configuration options
|
||||
@@ -194,6 +211,7 @@ A high-performance, multi-stage, lock-free pipeline for inter-thread communicati
|
||||
- See `config.md` for complete configuration documentation
|
||||
|
||||
**JSON Token Optimization** (`src/json_tokens.gperf`, `src/json_token_enum.hpp`):
|
||||
|
||||
- **Perfect hash table** generated by gperf for O(1) JSON key lookup
|
||||
- **Compile-time token enumeration** for type-safe key identification
|
||||
- **Minimal perfect hash** reduces memory overhead and improves cache locality
|
||||
@@ -202,6 +220,7 @@ A high-performance, multi-stage, lock-free pipeline for inter-thread communicati
|
||||
### Transaction Data Model
|
||||
|
||||
#### CommitRequest Structure
|
||||
|
||||
```
|
||||
CommitRequest {
|
||||
- request_id: Optional unique identifier
|
||||
@@ -220,18 +239,20 @@ CommitRequest {
|
||||
### Memory Management Model
|
||||
|
||||
#### Connection Ownership Lifecycle
|
||||
|
||||
1. **Creation**: Accept threads create connections, transfer to epoll as raw pointers
|
||||
2. **Processing**: Network threads claim ownership by wrapping in unique_ptr
|
||||
3. **Handler Transfer**: Handlers can take ownership for async processing via unique_ptr.release()
|
||||
4. **Return Path**: Handlers use Server::release_back_to_server() to return connections
|
||||
5. **Safety**: All transfers use weak_ptr to server for safe cleanup
|
||||
6. **Cleanup**: RAII ensures proper resource cleanup in all scenarios
|
||||
1. **Processing**: Network threads claim ownership by wrapping in unique_ptr
|
||||
1. **Handler Transfer**: Handlers can take ownership for async processing via unique_ptr.release()
|
||||
1. **Return Path**: Handlers use Server::release_back_to_server() to return connections
|
||||
1. **Safety**: All transfers use weak_ptr to server for safe cleanup
|
||||
1. **Cleanup**: RAII ensures proper resource cleanup in all scenarios
|
||||
|
||||
#### Arena Memory Lifecycle
|
||||
|
||||
1. **Request Processing**: Handler uses `conn->get_arena()` to allocate memory for parsing request data
|
||||
2. **Response Generation**: Handler uses arena for temporary response construction (headers, JSON, etc.)
|
||||
3. **Response Queuing**: Handler calls `conn->append_message()` which copies data to arena-backed message queue
|
||||
4. **Response Writing**: Server writes all queued messages to socket via `writeBytes()`
|
||||
1. **Response Generation**: Handler uses arena for temporary response construction (headers, JSON, etc.)
|
||||
1. **Response Queuing**: Handler calls `conn->append_message()` which copies data to arena-backed message queue
|
||||
1. **Response Writing**: Server writes all queued messages to socket via `writeBytes()`
|
||||
|
||||
> **Note**: Call `conn->reset()` periodically to reclaim arena memory. Best practice is after all outgoing bytes have been written.
|
||||
|
||||
@@ -241,22 +262,26 @@ CommitRequest {
|
||||
WeaselDB uses `EPOLLONESHOT` for all connection file descriptors to enable safe multi-threaded ownership transfer without complex synchronization:
|
||||
|
||||
**Key Benefits:**
|
||||
|
||||
1. **Automatic fd disarming** - When epoll triggers an event, the fd is automatically removed from epoll monitoring
|
||||
2. **Race-free ownership transfer** - Handlers can safely take connection ownership and move to other threads
|
||||
3. **Zero-coordination async processing** - No manual synchronization needed between network threads and handler threads
|
||||
1. **Race-free ownership transfer** - Handlers can safely take connection ownership and move to other threads
|
||||
1. **Zero-coordination async processing** - No manual synchronization needed between network threads and handler threads
|
||||
|
||||
**Threading Flow:**
|
||||
|
||||
1. **Event Trigger**: Network thread gets epoll event → connection auto-disarmed via ONESHOT
|
||||
2. **Safe Transfer**: Handler can take ownership (`std::move(conn_ptr)`) with no epoll interference
|
||||
3. **Async Processing**: Connection processed on handler thread while epoll cannot trigger spurious events
|
||||
4. **Return & Re-arm**: `Server::receiveConnectionBack()` re-arms fd with `epoll_ctl(EPOLL_CTL_MOD)`
|
||||
1. **Safe Transfer**: Handler can take ownership (`std::move(conn_ptr)`) with no epoll interference
|
||||
1. **Async Processing**: Connection processed on handler thread while epoll cannot trigger spurious events
|
||||
1. **Return & Re-arm**: `Server::receiveConnectionBack()` re-arms fd with `epoll_ctl(EPOLL_CTL_MOD)`
|
||||
|
||||
**Performance Trade-off:**
|
||||
|
||||
- **Cost**: One `epoll_ctl(MOD)` syscall per connection return (~100-200ns)
|
||||
- **Benefit**: Eliminates complex thread synchronization and prevents race conditions
|
||||
- **Alternative cost**: Manual `EPOLL_CTL_DEL`/`ADD` + locking would be significantly higher
|
||||
|
||||
**Without EPOLLONESHOT risks:**
|
||||
|
||||
- Multiple threads processing same fd simultaneously
|
||||
- Use-after-move when network thread accesses transferred connection
|
||||
- Complex synchronization between epoll events and ownership transfers
|
||||
@@ -270,14 +295,14 @@ The system implements a RESTful API. See [api.md](api.md) for comprehensive API
|
||||
### Design Principles
|
||||
|
||||
1. **Performance-first** - Every component optimized for high throughput
|
||||
2. **Scalable concurrency** - Multiple epoll instances eliminate kernel contention
|
||||
3. **Memory efficiency** - Arena allocation eliminates fragmentation
|
||||
4. **Efficient copying** - Minimize unnecessary copies while accepting required ones
|
||||
5. **Streaming-ready** - Support incremental processing
|
||||
6. **Type safety** - Compile-time validation where possible
|
||||
7. **Resource management** - RAII and move semantics throughout
|
||||
1. **Scalable concurrency** - Multiple epoll instances eliminate kernel contention
|
||||
1. **Memory efficiency** - Arena allocation eliminates fragmentation
|
||||
1. **Efficient copying** - Minimize unnecessary copies while accepting required ones
|
||||
1. **Streaming-ready** - Support incremental processing
|
||||
1. **Type safety** - Compile-time validation where possible
|
||||
1. **Resource management** - RAII and move semantics throughout
|
||||
|
||||
---
|
||||
______________________________________________________________________
|
||||
|
||||
## Development Guidelines
|
||||
|
||||
@@ -308,20 +333,22 @@ See [style.md](style.md) for comprehensive C++ coding standards and conventions.
|
||||
### Extension Points
|
||||
|
||||
#### Adding New Protocol Handlers
|
||||
|
||||
1. Inherit from `ConnectionHandler` in `src/connection_handler.hpp`
|
||||
2. Implement `on_data_arrived()` with proper ownership semantics
|
||||
3. Use connection's arena allocator for temporary allocations: `conn->get_arena()`
|
||||
4. Handle partial messages and streaming protocols appropriately
|
||||
5. Use `Server::release_back_to_server()` if taking ownership for async processing
|
||||
6. Add corresponding test cases and integration tests
|
||||
7. Consider performance implications of ownership transfers
|
||||
1. Implement `on_data_arrived()` with proper ownership semantics
|
||||
1. Use connection's arena allocator for temporary allocations: `conn->get_arena()`
|
||||
1. Handle partial messages and streaming protocols appropriately
|
||||
1. Use `Server::release_back_to_server()` if taking ownership for async processing
|
||||
1. Add corresponding test cases and integration tests
|
||||
1. Consider performance implications of ownership transfers
|
||||
|
||||
#### Adding New Parsers
|
||||
|
||||
1. Inherit from `CommitRequestParser` in `src/commit_request_parser.hpp`
|
||||
2. Implement both streaming and one-shot parsing modes
|
||||
3. Use arena allocation for all temporary string storage
|
||||
4. Add corresponding test cases in `tests/`
|
||||
5. Add benchmark comparisons in `benchmarks/`
|
||||
1. Implement both streaming and one-shot parsing modes
|
||||
1. Use arena allocation for all temporary string storage
|
||||
1. Add corresponding test cases in `tests/`
|
||||
1. Add benchmark comparisons in `benchmarks/`
|
||||
|
||||
### Performance Guidelines
|
||||
|
||||
@@ -337,13 +364,14 @@ See [style.md](style.md) for comprehensive C++ coding standards and conventions.
|
||||
- **Build System**: CMake generates gperf hash tables at build time
|
||||
- **Testing Guidelines**: See [style.md](style.md) for comprehensive testing standards including synchronization rules
|
||||
|
||||
---
|
||||
______________________________________________________________________
|
||||
|
||||
## Common Patterns
|
||||
|
||||
### Factory Method Patterns
|
||||
|
||||
#### Server Creation
|
||||
|
||||
```cpp
|
||||
// Server must be created via factory method
|
||||
auto server = Server::create(config, handler);
|
||||
@@ -354,6 +382,7 @@ auto server = Server::create(config, handler);
|
||||
```
|
||||
|
||||
#### Connection Creation (Server-Only)
|
||||
|
||||
```cpp
|
||||
// Only Server can create connections (using private friend method)
|
||||
class Server {
|
||||
@@ -370,6 +399,7 @@ private:
|
||||
### ConnectionHandler Implementation Patterns
|
||||
|
||||
#### Simple Synchronous Handler
|
||||
|
||||
```cpp
|
||||
class HttpHandler : public ConnectionHandler {
|
||||
public:
|
||||
@@ -386,6 +416,7 @@ public:
|
||||
```
|
||||
|
||||
#### Async Handler with Ownership Transfer
|
||||
|
||||
```cpp
|
||||
class AsyncHandler : public ConnectionHandler {
|
||||
public:
|
||||
@@ -405,6 +436,7 @@ public:
|
||||
```
|
||||
|
||||
#### Batching Handler with User Data
|
||||
|
||||
```cpp
|
||||
class BatchingHandler : public ConnectionHandler {
|
||||
public:
|
||||
@@ -444,6 +476,7 @@ private:
|
||||
```
|
||||
|
||||
#### Streaming "yes" Handler
|
||||
|
||||
```cpp
|
||||
class YesHandler : public ConnectionHandler {
|
||||
public:
|
||||
@@ -466,6 +499,7 @@ public:
|
||||
### Memory Management Patterns
|
||||
|
||||
#### Arena-Based String Handling
|
||||
|
||||
```cpp
|
||||
// Preferred: String view with arena allocation to minimize copying
|
||||
std::string_view process_json_key(const char* data, Arena& arena);
|
||||
@@ -475,6 +509,7 @@ std::string process_json_key(const char* data);
|
||||
```
|
||||
|
||||
#### Safe Connection Ownership Transfer
|
||||
|
||||
```cpp
|
||||
// In handler - take ownership for background processing
|
||||
Connection* raw_conn = conn_ptr.release();
|
||||
@@ -492,6 +527,7 @@ background_processor.submit([raw_conn]() {
|
||||
### Data Construction Patterns
|
||||
|
||||
#### Builder Pattern Usage
|
||||
|
||||
```cpp
|
||||
CommitRequest request = CommitRequestBuilder(arena)
|
||||
.request_id("example-id")
|
||||
@@ -501,41 +537,47 @@ CommitRequest request = CommitRequestBuilder(arena)
|
||||
```
|
||||
|
||||
#### Error Handling Pattern
|
||||
|
||||
```cpp
|
||||
enum class ParseResult { Success, InvalidJson, MissingField };
|
||||
ParseResult parse_commit_request(const char* json, CommitRequest& out);
|
||||
```
|
||||
|
||||
---
|
||||
______________________________________________________________________
|
||||
|
||||
## Reference
|
||||
|
||||
### Build Targets
|
||||
|
||||
**Test Executables:**
|
||||
|
||||
- `test_arena` - Arena allocator functionality tests
|
||||
- `test_commit_request` - JSON parsing and validation tests
|
||||
- `test_metric` - Metrics system functionality tests
|
||||
- Main server executable (compiled from `src/main.cpp`)
|
||||
|
||||
**Benchmark Executables:**
|
||||
|
||||
- `bench_arena` - Arena allocator performance benchmarks
|
||||
- `bench_commit_request` - JSON parsing performance benchmarks
|
||||
- `bench_parser_comparison` - Comparison benchmarks vs nlohmann::json and RapidJSON
|
||||
- `bench_metric` - Metrics system performance benchmarks
|
||||
|
||||
**Debug Tools:**
|
||||
|
||||
- `debug_arena` - Debug tool for arena allocator analysis
|
||||
|
||||
### Performance Characteristics
|
||||
|
||||
**Memory Allocation:**
|
||||
|
||||
- **~1ns allocation time** vs standard allocators
|
||||
- **Bulk deallocation** eliminates individual free() calls
|
||||
- **Optimized geometric growth** uses current block size for doubling strategy
|
||||
- **Alignment-aware** allocation prevents performance penalties
|
||||
|
||||
**JSON Parsing:**
|
||||
|
||||
- **Streaming parser** handles large payloads efficiently
|
||||
- **Incremental processing** suitable for network protocols
|
||||
- **Arena storage** eliminates string allocation overhead
|
||||
|
||||
@@ -16,7 +16,7 @@ The persistence thread receives commit batches from the main processing pipeline
|
||||
The persistence thread collects commits into batches using two trigger conditions:
|
||||
|
||||
1. **Time Trigger**: `batch_timeout_ms` elapsed since batch collection started
|
||||
2. **Size Trigger**: `batch_size_threshold` commits collected (can be exceeded by final commit)
|
||||
1. **Size Trigger**: `batch_size_threshold` commits collected (can be exceeded by final commit)
|
||||
|
||||
**Flow Control**: When `max_in_flight_requests` reached, block until responses received. Batches in retry backoff count toward the in-flight limit, creating natural backpressure during failures.
|
||||
|
||||
@@ -25,10 +25,12 @@ The persistence thread collects commits into batches using two trigger condition
|
||||
### 1. Batch Collection
|
||||
|
||||
**No In-Flight Requests** (no I/O to pump):
|
||||
|
||||
- Use blocking acquire to get first commit batch (can afford to wait)
|
||||
- Process immediately (no batching delay)
|
||||
|
||||
**With In-Flight Requests** (I/O to pump in event loop):
|
||||
|
||||
- Check flow control: if at `max_in_flight_requests`, block for responses
|
||||
- Collect commits using non-blocking acquire until trigger condition:
|
||||
- Check for available commits (non-blocking)
|
||||
@@ -97,9 +99,10 @@ The persistence thread collects commits into batches using two trigger condition
|
||||
## Configuration Validation
|
||||
|
||||
**Required Constraints**:
|
||||
|
||||
- `batch_size_threshold` > 0 (must process at least one commit per batch)
|
||||
- `max_in_flight_requests` > 0 (must allow at least one concurrent request)
|
||||
- `max_in_flight_requests` <= 1000 (required for single-call recovery guarantee)
|
||||
- `max_in_flight_requests` \<= 1000 (required for single-call recovery guarantee)
|
||||
- `batch_timeout_ms` > 0 (timeout must be positive)
|
||||
- `max_retry_attempts` >= 0 (zero disables retries)
|
||||
- `retry_base_delay_ms` > 0 (delay must be positive if retries enabled)
|
||||
@@ -123,16 +126,19 @@ WeaselDB's batched persistence design enables efficient recovery while maintaini
|
||||
WeaselDB uses a **sequential batch numbering** scheme with **S3 atomic operations** to provide efficient crash recovery and split-brain prevention without external coordination services.
|
||||
|
||||
**Batch Numbering Scheme**:
|
||||
|
||||
- Batch numbers start at `2^64 - 1` and count downward: `18446744073709551615, 18446744073709551614, 18446744073709551613, ...`
|
||||
- Each batch is stored as S3 object `batches/{batch_number:020d}` with zero-padding
|
||||
- S3 lexicographic ordering on zero-padded numbers returns batches in ascending numerical order (latest batches first)
|
||||
|
||||
**Terminology**: Since batch numbers decrease over time, we use numerical ordering:
|
||||
|
||||
- "Older" batches = higher numbers (written first in time)
|
||||
- "Newer" batches = lower numbers (written more recently)
|
||||
- "Most recent" batches = lowest numbers (most recently written)
|
||||
|
||||
**Example**: If batches 100, 99, 98, 97 are written, S3 LIST returns them as:
|
||||
|
||||
```
|
||||
batches/00000000000000000097 (newest, lowest batch number)
|
||||
batches/00000000000000000098
|
||||
@@ -142,6 +148,7 @@ batches/00000000000000000100 (oldest, highest batch number)
|
||||
```
|
||||
|
||||
**Leadership and Split-Brain Prevention**:
|
||||
|
||||
- New persistence thread instances scan S3 to find the highest (oldest) available batch number
|
||||
- Each batch write uses `If-None-Match="*"` to atomically claim the sequential batch number
|
||||
- Only one instance can successfully claim each batch number, preventing split-brain scenarios
|
||||
@@ -150,28 +157,32 @@ batches/00000000000000000100 (oldest, highest batch number)
|
||||
**Recovery Scenarios**:
|
||||
|
||||
**Clean Shutdown**:
|
||||
|
||||
- All in-flight batches are drained to completion before termination
|
||||
- Durability watermark accurately reflects all durable state
|
||||
- No recovery required on restart
|
||||
|
||||
**Crash Recovery**:
|
||||
|
||||
1. **S3 Scan with Bounded Cost**: List S3 objects with prefix `batches/` and limit of 1000 objects
|
||||
2. **Gap Detection**: Check for missing sequential batch numbers. WeaselDB never puts more than 1000 batches in flight concurrently, so a limit of 1000 is sufficient.
|
||||
3. **Watermark Reconstruction**: Set durability watermark to the latest consecutive batch (scanning from highest numbers downward, until a gap)
|
||||
4. **Leadership Transition**: Begin writing batches starting from next available batch number. Skip past any batch numbers already claimed in the durability watermark scan.
|
||||
1. **Gap Detection**: Check for missing sequential batch numbers. WeaselDB never puts more than 1000 batches in flight concurrently, so a limit of 1000 is sufficient.
|
||||
1. **Watermark Reconstruction**: Set durability watermark to the latest consecutive batch (scanning from highest numbers downward, until a gap)
|
||||
1. **Leadership Transition**: Begin writing batches starting from next available batch number. Skip past any batch numbers already claimed in the durability watermark scan.
|
||||
|
||||
**Bounded Recovery Guarantee**: Since at most 1000 batches can be in-flight during a crash, any gap in the sequential numbering (indicating the durability watermark) must appear within the first 1000 S3 objects. This is because:
|
||||
|
||||
1. At most 1000 batches can be incomplete when crash occurs
|
||||
2. S3 LIST returns objects in ascending numerical order (most recent batches first due to countdown numbering)
|
||||
3. The first gap found represents the boundary between durable and potentially incomplete batches
|
||||
4. S3 LIST operations have a maximum limit of 1000 objects per request
|
||||
5. Therefore, scanning 1000 objects (the maximum S3 allows in one request) is sufficient to find this boundary
|
||||
1. S3 LIST returns objects in ascending numerical order (most recent batches first due to countdown numbering)
|
||||
1. The first gap found represents the boundary between durable and potentially incomplete batches
|
||||
1. S3 LIST operations have a maximum limit of 1000 objects per request
|
||||
1. Therefore, scanning 1000 objects (the maximum S3 allows in one request) is sufficient to find this boundary
|
||||
|
||||
This ensures **O(1) recovery time** regardless of database size, with at most **one S3 LIST operation** required.
|
||||
|
||||
**Recovery Protocol Detail**: Even with exactly 1000 batches in-flight, recovery works correctly:
|
||||
|
||||
**Example Scenario**: Batches 2000 down to 1001 (1000 batches) are in-flight when crash occurs
|
||||
|
||||
- Previous successful run had written through batch 2001
|
||||
- Worst case: batch 2000 (oldest in-flight) fails, batches 1999 down to 1001 (newer) all succeed
|
||||
- S3 LIST(limit=1000) returns: 1001, 1002, ..., 1998, 1999, 2001 (ascending numerical order)
|
||||
|
||||
112
style.md
112
style.md
@@ -5,28 +5,31 @@ This document describes the C++ coding style used in the WeaselDB project. These
|
||||
## Table of Contents
|
||||
|
||||
1. [General Principles](#general-principles)
|
||||
2. [Naming Conventions](#naming-conventions)
|
||||
3. [File Organization](#file-organization)
|
||||
4. [Code Structure](#code-structure)
|
||||
5. [Memory Management](#memory-management)
|
||||
6. [Error Handling](#error-handling)
|
||||
7. [Documentation](#documentation)
|
||||
8. [Testing](#testing)
|
||||
1. [Naming Conventions](#naming-conventions)
|
||||
1. [File Organization](#file-organization)
|
||||
1. [Code Structure](#code-structure)
|
||||
1. [Memory Management](#memory-management)
|
||||
1. [Error Handling](#error-handling)
|
||||
1. [Documentation](#documentation)
|
||||
1. [Testing](#testing)
|
||||
|
||||
---
|
||||
______________________________________________________________________
|
||||
|
||||
## General Principles
|
||||
|
||||
### Language Standard
|
||||
|
||||
- **C++20** is the target standard
|
||||
- Use modern C++ features: RAII, move semantics, constexpr, concepts where appropriate
|
||||
- Prefer standard library containers and algorithms over custom implementations
|
||||
|
||||
### C Library Functions and Headers
|
||||
|
||||
- **Always use std:: prefixed versions** of C library functions for consistency and clarity
|
||||
- **Use C++ style headers** (`<cstring>`, `<cstdlib>`, etc.) instead of C style headers (`<string.h>`, `<stdlib.h>`, etc.)
|
||||
- This applies to all standard libc functions: `std::abort()`, `std::fprintf()`, `std::free()`, `std::memcpy()`, `std::strlen()`, `std::strncpy()`, `std::memset()`, `std::signal()`, etc.
|
||||
- **Exception:** Functions with no std:: equivalent (e.g., `perror()`, `gai_strerror()`) and system-specific headers (e.g., `<unistd.h>`, `<fcntl.h>`)
|
||||
|
||||
```cpp
|
||||
// Preferred - C++ style
|
||||
#include <cstring>
|
||||
@@ -56,6 +59,7 @@ signal(SIGTERM, handler);
|
||||
```
|
||||
|
||||
### Data Types
|
||||
|
||||
- **Almost always signed** - prefer `int`, `int64_t`, `ssize_t` over unsigned types except for:
|
||||
- Bit manipulation operations
|
||||
- Interfacing with APIs that require unsigned types
|
||||
@@ -73,6 +77,7 @@ signal(SIGTERM, handler);
|
||||
- Never use for counts, sizes, or business logic
|
||||
|
||||
### Type Casting
|
||||
|
||||
- **Never use C-style casts** - they're unsafe and can hide bugs by performing dangerous conversions
|
||||
- **Use C++ cast operators** for explicit type conversions with clear intent and safety checks
|
||||
- **Avoid `reinterpret_cast`** - almost always indicates poor design; redesign APIs instead
|
||||
@@ -94,6 +99,7 @@ auto addr = reinterpret_cast<uintptr_t>(ptr); // Pointer to integer conv
|
||||
```
|
||||
|
||||
### Performance Focus
|
||||
|
||||
- **Performance-first design** - optimize for the hot path
|
||||
- **Simple is fast** - find exactly what's necessary, strip away everything else
|
||||
- **Complexity must be justified with benchmarks** - measure performance impact before adding complexity
|
||||
@@ -103,9 +109,11 @@ auto addr = reinterpret_cast<uintptr_t>(ptr); // Pointer to integer conv
|
||||
- **Arena allocation** for efficient memory management (~1ns vs ~20-270ns for malloc)
|
||||
|
||||
### String Formatting
|
||||
|
||||
- **Always use `format.hpp` functions** - formats directly into arena-allocated memory
|
||||
- **Use `static_format()` for performance-sensitive code** - faster but less flexible than `format()`
|
||||
- **Use `format()` function with arena allocator** for printf-style formatting
|
||||
|
||||
```cpp
|
||||
// Most performance-sensitive - compile-time optimized concatenation
|
||||
std::string_view response = static_format(arena,
|
||||
@@ -124,6 +132,7 @@ std::string_view response = format(arena,
|
||||
```
|
||||
|
||||
### Complexity Control
|
||||
|
||||
- **Encapsulation is the main tool for controlling complexity**
|
||||
- **Header files define the interface** - they are the contract with users of your code
|
||||
- **Headers should be complete** - include everything needed to use the interface effectively:
|
||||
@@ -134,13 +143,15 @@ std::string_view response = format(arena,
|
||||
- Ownership and lifetime semantics
|
||||
- **Do not rely on undocumented interface properties** - if it's not in the header, don't depend on it
|
||||
|
||||
---
|
||||
______________________________________________________________________
|
||||
|
||||
## Naming Conventions
|
||||
|
||||
### Variables and Functions
|
||||
|
||||
- **snake_case** for all variables, functions, and member functions
|
||||
- **Legacy camelCase exists** - the codebase currently contains mixed naming due to historical development. New code should use snake_case. Existing camelCase should be converted to snake_case during natural refactoring (not mass renaming).
|
||||
|
||||
```cpp
|
||||
int64_t used_bytes() const;
|
||||
void add_block(int64_t size);
|
||||
@@ -148,11 +159,13 @@ int32_t initial_block_size_;
|
||||
```
|
||||
|
||||
### Classes and Structs
|
||||
|
||||
- **PascalCase** for class/struct names
|
||||
- **Always use struct keyword** - eliminates debates about complexity and maintains consistency
|
||||
- **Public members first, private after** - puts the interface users care about at the top, implementation details below
|
||||
- **Full encapsulation still applies** - use `private:` sections to hide implementation details and maintain deep, capable structs
|
||||
- The struct keyword doesn't mean shallow design - it means interface-first organization for human readers
|
||||
|
||||
```cpp
|
||||
struct Arena {
|
||||
// Public interface first
|
||||
@@ -167,8 +180,10 @@ private:
|
||||
```
|
||||
|
||||
### Enums
|
||||
|
||||
- **PascalCase** for enum class names
|
||||
- **PascalCase** for enum values (not SCREAMING_SNAKE_CASE)
|
||||
|
||||
```cpp
|
||||
enum class Type {
|
||||
PointRead,
|
||||
@@ -183,14 +198,18 @@ enum class ParseState {
|
||||
```
|
||||
|
||||
### Constants and Macros
|
||||
|
||||
- **snake_case** for constants
|
||||
- Avoid macros when possible; prefer `constexpr` variables
|
||||
|
||||
```cpp
|
||||
static const WeaselJsonCallbacks json_callbacks;
|
||||
```
|
||||
|
||||
### Member Variables
|
||||
|
||||
- **Trailing underscore** for private member variables
|
||||
|
||||
```cpp
|
||||
private:
|
||||
int32_t initial_block_size_;
|
||||
@@ -198,24 +217,28 @@ private:
|
||||
```
|
||||
|
||||
### Template Parameters
|
||||
|
||||
- **PascalCase** for template type parameters
|
||||
|
||||
```cpp
|
||||
template <typename T, typename... Args>
|
||||
template <typename T> struct rebind { using type = T*; };
|
||||
```
|
||||
|
||||
---
|
||||
______________________________________________________________________
|
||||
|
||||
## File Organization
|
||||
|
||||
### Include Organization
|
||||
|
||||
- Use **`#pragma once`** instead of include guards
|
||||
- **Never `using namespace std`** - always use fully qualified names for clarity and safety
|
||||
- **Include order** (applies to both headers and source files):
|
||||
1. Corresponding header file (for .cpp files only)
|
||||
2. Standard library headers (alphabetical)
|
||||
3. Third-party library headers
|
||||
4. Project headers
|
||||
1. Standard library headers (alphabetical)
|
||||
1. Third-party library headers
|
||||
1. Project headers
|
||||
|
||||
```cpp
|
||||
#pragma once
|
||||
|
||||
@@ -239,14 +262,16 @@ std::vector<int> data;
|
||||
std::unique_ptr<Parser> parser;
|
||||
```
|
||||
|
||||
---
|
||||
______________________________________________________________________
|
||||
|
||||
## Code Structure
|
||||
|
||||
### Class Design
|
||||
|
||||
- **Move-only semantics** for resource-owning types
|
||||
- **Explicit constructors** to prevent implicit conversions
|
||||
- **Delete copy operations** when inappropriate
|
||||
|
||||
```cpp
|
||||
struct Arena {
|
||||
explicit Arena(int64_t initial_size = 1024);
|
||||
@@ -266,12 +291,14 @@ private:
|
||||
```
|
||||
|
||||
### Function Design
|
||||
|
||||
- **Const correctness** - mark methods const when appropriate
|
||||
- **Parameter passing:**
|
||||
- Pass by value for types ≤ 16 bytes (int, pointers, string_view, small structs)
|
||||
- Pass by const reference for types > 16 bytes (containers, large objects)
|
||||
- **Return by value** for small types (≤ 16 bytes), **string_view** to avoid copying strings
|
||||
- **noexcept specification** for move operations and non-throwing functions
|
||||
|
||||
```cpp
|
||||
std::span<const Operation> operations() const { return operations_; }
|
||||
void process_data(std::string_view request_data); // ≤ 16 bytes, pass by value
|
||||
@@ -280,10 +307,12 @@ Arena(Arena &&source) noexcept;
|
||||
```
|
||||
|
||||
### Template Usage
|
||||
|
||||
- **Template constraints** using static_assert for better error messages
|
||||
- **SFINAE** or concepts for template specialization
|
||||
|
||||
### Factory Patterns & Ownership
|
||||
|
||||
- **Static factory methods** for complex construction requiring shared ownership
|
||||
- **Friend-based factories** for access control when constructor should be private
|
||||
- **Ownership guidelines:**
|
||||
@@ -310,8 +339,10 @@ private:
|
||||
```
|
||||
|
||||
### Control Flow
|
||||
|
||||
- **Early returns** to reduce nesting
|
||||
- **Range-based for loops** when possible
|
||||
|
||||
```cpp
|
||||
if (size == 0) {
|
||||
return nullptr;
|
||||
@@ -323,9 +354,11 @@ for (auto &precondition : preconditions_) {
|
||||
```
|
||||
|
||||
### Atomic Operations
|
||||
|
||||
- **Never use assignment operators** with `std::atomic` - always use explicit `store()` and `load()`
|
||||
- **Always specify memory ordering** explicitly for atomic operations
|
||||
- **Use the least restrictive correct memory ordering** - choose the weakest ordering that maintains correctness
|
||||
|
||||
```cpp
|
||||
// Preferred - explicit store/load with precise memory ordering
|
||||
std::atomic<uint64_t> counter;
|
||||
@@ -343,14 +376,16 @@ counter = 42; // Implicit - memory ordering not explicit
|
||||
auto value = counter; // Implicit - memory ordering not explicit
|
||||
```
|
||||
|
||||
---
|
||||
______________________________________________________________________
|
||||
|
||||
## Memory Management
|
||||
|
||||
### Ownership & Allocation
|
||||
|
||||
- **Arena allocators** for request-scoped memory with **STL allocator adapters** (see Performance Focus section for characteristics)
|
||||
- **String views** pointing to arena-allocated memory to avoid unnecessary copying
|
||||
- **STL containers with arena allocators require default construction after arena reset** - `clear()` is not sufficient
|
||||
|
||||
```cpp
|
||||
// STL containers with arena allocators - correct reset pattern
|
||||
std::vector<Operation, ArenaStlAllocator<Operation>> operations(arena);
|
||||
@@ -360,9 +395,11 @@ arena.reset(); // Reset arena memory
|
||||
```
|
||||
|
||||
### Resource Management
|
||||
|
||||
- **RAII** everywhere - constructors acquire, destructors release
|
||||
- **Move semantics** for efficient resource transfer
|
||||
- **Explicit cleanup** methods where appropriate
|
||||
|
||||
```cpp
|
||||
~Arena() {
|
||||
while (current_block_) {
|
||||
@@ -373,16 +410,18 @@ arena.reset(); // Reset arena memory
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
______________________________________________________________________
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Error Classification & Response
|
||||
|
||||
- **Expected errors** (invalid input, timeouts): Return error codes for programmatic handling
|
||||
- **System failures** (malloc fail, socket fail): Abort immediately with error message
|
||||
- **Programming errors** (precondition violations, assertions): Abort immediately
|
||||
|
||||
### Error Contract Design
|
||||
|
||||
- **Error codes are the API contract** - use enums for programmatic decisions
|
||||
- **Error messages are human-readable only** - never parse message strings
|
||||
- **Consistent error boundaries** - each component defines what it can/cannot recover from
|
||||
@@ -405,6 +444,7 @@ assert(ptr != nullptr && "Precondition violated: pointer must be non-null");
|
||||
```
|
||||
|
||||
### Assertions
|
||||
|
||||
- **Programming error detection** using standard `assert()` macro
|
||||
- **Assertion behavior follows C++ standards:**
|
||||
- **Debug builds**: Assertions active (undefined `NDEBUG`)
|
||||
@@ -413,6 +453,7 @@ assert(ptr != nullptr && "Precondition violated: pointer must be non-null");
|
||||
- **Static assertions** for compile-time validation (always active)
|
||||
|
||||
**Usage guidelines:**
|
||||
|
||||
- Use for programming errors: null checks, precondition validation, invariants
|
||||
- Don't use for expected runtime errors: use return codes instead
|
||||
|
||||
@@ -468,26 +509,28 @@ if (result == -1 && errno != EINTR) {
|
||||
|
||||
Most system calls are not interruptible in practice. For these, it is not necessary to add a retry loop. This includes:
|
||||
|
||||
* `fcntl` (with `F_GETFL`, `F_SETFL`, `F_GETFD`, `F_SETFD` - note: `F_SETLKW` and `F_OFD_SETLKW` CAN return EINTR)
|
||||
* `epoll_ctl`
|
||||
* `socketpair`
|
||||
* `pipe`
|
||||
* `setsockopt`
|
||||
* `epoll_create1`
|
||||
* `close` (special case: guaranteed closed even on EINTR on Linux)
|
||||
- `fcntl` (with `F_GETFL`, `F_SETFL`, `F_GETFD`, `F_SETFD` - note: `F_SETLKW` and `F_OFD_SETLKW` CAN return EINTR)
|
||||
- `epoll_ctl`
|
||||
- `socketpair`
|
||||
- `pipe`
|
||||
- `setsockopt`
|
||||
- `epoll_create1`
|
||||
- `close` (special case: guaranteed closed even on EINTR on Linux)
|
||||
|
||||
When in doubt, consult the `man` page for the specific system call to see if it can return `EINTR`.
|
||||
|
||||
---
|
||||
______________________________________________________________________
|
||||
|
||||
## Documentation
|
||||
|
||||
### Doxygen Style
|
||||
|
||||
- **/** for struct and public method documentation
|
||||
- **@brief** for short descriptions
|
||||
- **@param** and **@return** for function parameters
|
||||
- **@note** for important implementation notes
|
||||
- **@warning** for critical usage warnings
|
||||
|
||||
```cpp
|
||||
/**
|
||||
* @brief Type-safe version of realloc_raw for arrays of type T.
|
||||
@@ -502,9 +545,11 @@ T *realloc(T *existing_ptr, int32_t current_size, int32_t requested_size);
|
||||
```
|
||||
|
||||
### Code Comments
|
||||
|
||||
- **Explain why, not what** - code should be self-documenting
|
||||
- **Performance notes** for optimization decisions
|
||||
- **Thread safety** and ownership semantics
|
||||
|
||||
```cpp
|
||||
// Uses O(1) accumulated counters for fast retrieval
|
||||
int64_t total_allocated() const;
|
||||
@@ -514,20 +559,23 @@ Connection(struct sockaddr_storage addr, int fd, int64_t id,
|
||||
ConnectionHandler *handler, std::weak_ptr<Server> server);
|
||||
```
|
||||
|
||||
---
|
||||
______________________________________________________________________
|
||||
|
||||
## Testing
|
||||
|
||||
### Test Framework
|
||||
|
||||
- **doctest** for unit testing
|
||||
- **TEST_CASE** and **SUBCASE** for test organization
|
||||
- **CHECK** for assertions (non-terminating)
|
||||
- **REQUIRE** for critical assertions (terminating)
|
||||
|
||||
### Test Structure
|
||||
|
||||
- **Descriptive test names** explaining the scenario
|
||||
- **SUBCASE** for related test variations
|
||||
- **Fresh instances** for each test to avoid state contamination
|
||||
|
||||
```cpp
|
||||
TEST_CASE("Arena basic allocation") {
|
||||
Arena arena;
|
||||
@@ -546,10 +594,12 @@ TEST_CASE("Arena basic allocation") {
|
||||
```
|
||||
|
||||
### Test Design Principles
|
||||
|
||||
- **Test the contract, not the implementation** - validate what the API promises to deliver, not implementation details
|
||||
- **Both integration and unit tests** - test components in isolation and working together
|
||||
- **Prefer fakes to mocks** - use real implementations for internal components, fake external dependencies
|
||||
- **Always enable assertions in tests** - use `-UNDEBUG` pattern to ensure assertions are checked (see Build Integration section)
|
||||
|
||||
```cpp
|
||||
// Good: Testing through public API
|
||||
TEST_CASE("Server accepts connections") {
|
||||
@@ -569,11 +619,13 @@ TEST_CASE("Server accepts connections") {
|
||||
### What NOT to Test
|
||||
|
||||
**Avoid testing language features and plumbing:**
|
||||
|
||||
- Don't test that virtual functions dispatch correctly
|
||||
- Don't test that standard library types work (unique_ptr, containers, etc.)
|
||||
- Don't test basic constructor/destructor calls
|
||||
|
||||
**Test business logic instead:**
|
||||
|
||||
- When does your code call hooks/callbacks and why?
|
||||
- What state transitions trigger behavior changes?
|
||||
- How does your code handle error conditions?
|
||||
@@ -582,6 +634,7 @@ TEST_CASE("Server accepts connections") {
|
||||
**Ask: "Am I testing the C++ compiler or my application logic?"**
|
||||
|
||||
### Test Synchronization (Authoritative Rules)
|
||||
|
||||
- **ABSOLUTELY NEVER use timeouts** (`sleep_for`, `wait_for`, etc.)
|
||||
- **Deterministic synchronization only:**
|
||||
- Blocking I/O (naturally waits for completion)
|
||||
@@ -592,6 +645,7 @@ TEST_CASE("Server accepts connections") {
|
||||
#### Threading Checklist for Tests/Benchmarks
|
||||
|
||||
**Common threading principles (all concurrent code):**
|
||||
|
||||
- **Count total threads** - Include main/benchmark thread in count
|
||||
- **Always assume concurrent execution needed** - Tests/benchmarks require real concurrency
|
||||
- **Add synchronization primitive** - `std::latch start_latch{N}` (most common), `std::barrier`, or similar where N = total concurrent threads
|
||||
@@ -599,18 +653,21 @@ TEST_CASE("Server accepts connections") {
|
||||
- **Main thread synchronizes before measurement/execution** - ensures all threads start simultaneously
|
||||
|
||||
**Test-specific:**
|
||||
|
||||
- **Perform many operations per thread creation** - amortize thread creation cost and increase chances of hitting race conditions
|
||||
- **Pattern: Create test that spawns threads and runs many operations, then run that test many times** - amortizes thread creation cost while providing fresh test instances
|
||||
- **Run 100-10000 operations per test, and 100-10000 test iterations** - maximizes chances of hitting race conditions
|
||||
- **Always run with ThreadSanitizer** - compile with `-fsanitize=thread`
|
||||
|
||||
**Benchmark-specific:**
|
||||
|
||||
- **NEVER create threads inside the benchmark measurement** - creates thread creation/destruction overhead, not contention
|
||||
- **Create background threads OUTSIDE the benchmark** that run continuously during measurement
|
||||
- **Use `std::atomic<bool> keep_running` to cleanly shut down background threads after benchmark**
|
||||
- **Measure only the foreground operation under real contention from background threads**
|
||||
|
||||
**Red flags to catch immediately:**
|
||||
|
||||
- ❌ Creating threads in a loop without `std::latch`
|
||||
- ❌ Background threads starting work immediately
|
||||
- ❌ Benchmark measuring before all threads synchronized
|
||||
@@ -636,11 +693,12 @@ for (int i = 0; i < 4; ++i) {
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
______________________________________________________________________
|
||||
|
||||
## Build Integration
|
||||
|
||||
### Build Configuration
|
||||
|
||||
```bash
|
||||
# Debug: assertions on, optimizations off
|
||||
cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Debug -DCMAKE_EXPORT_COMPILE_COMMANDS=ON
|
||||
@@ -650,6 +708,7 @@ cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Release -DCMAKE_EXPORT_COMPILE_COMMANDS=ON
|
||||
```
|
||||
|
||||
**Test Target Pattern:**
|
||||
|
||||
- Production targets follow build type (assertions off in Release)
|
||||
- Test targets use `-UNDEBUG` to force assertions on in all builds
|
||||
- Ensures consistent test validation regardless of build type
|
||||
@@ -666,4 +725,5 @@ add_executable(example src/example.cpp src/main.cpp)
|
||||
```
|
||||
|
||||
### Code Generation
|
||||
|
||||
- Generated files go in build directory, not source
|
||||
|
||||
@@ -7,12 +7,14 @@ WeaselDB's /ok health check endpoint achieves 1M requests/second with 740ns of c
|
||||
## Performance Metrics
|
||||
|
||||
### Throughput
|
||||
|
||||
- **1.0M requests/second** /ok health check endpoint (4-stage commit pipeline)
|
||||
- 8 I/O threads with 8 epoll instances
|
||||
- Load tester used 12 network threads
|
||||
- **0% CPU usage when idle** (optimized futex wake implementation)
|
||||
|
||||
### Threading Architecture
|
||||
|
||||
- **Four-stage commit pipeline**: Sequence → Resolve → Persist → Release
|
||||
- Lock-free coordination using atomic ring buffer
|
||||
- **Optimized futex wake**: Only wake on final pipeline stage
|
||||
@@ -21,6 +23,7 @@ WeaselDB's /ok health check endpoint achieves 1M requests/second with 740ns of c
|
||||
### Performance Characteristics
|
||||
|
||||
**Health Check Pipeline (/ok endpoint)**:
|
||||
|
||||
- **Throughput**: 1.0M requests/second
|
||||
- **Configurable CPU work**: 740ns (4000 iterations, validated with nanobench)
|
||||
- **Theoretical maximum CPU time**: 1000ns (1,000,000,000ns ÷ 1,000,000 req/s)
|
||||
@@ -31,23 +34,27 @@ WeaselDB's /ok health check endpoint achieves 1M requests/second with 740ns of c
|
||||
### Key Optimizations
|
||||
|
||||
**Futex Wake Reduction**:
|
||||
|
||||
- **Previous approach**: Futex wake at every pipeline stage (10% CPU overhead)
|
||||
- **Optimized approach**: Futex wake only at final stage to wake producers. Stages now do their futex wait on the beginning of the pipeline instead of the previous stage.
|
||||
- **Result**: 23% increase in serial CPU budget (396ns → 488ns)
|
||||
- **Benefits**: Higher throughput per CPU cycle + idle efficiency
|
||||
|
||||
**CPU-Friendly Spin Loop**:
|
||||
|
||||
- **Added**: `_mm_pause()` intrinsics in polling loop to reduce power consumption and improve hyperthreading efficiency
|
||||
- **Maintained**: 100,000 spin iterations necessary to prevent thread descheduling
|
||||
- **Result**: Same throughput with more efficient spinning
|
||||
|
||||
**Resolve Batch Size Optimization**:
|
||||
|
||||
- **Changed**: Resolve max batch size from unlimited to 1
|
||||
- **Mechanism**: Single-item processing checks for work more frequently, keeping the thread in fast coordination paths instead of expensive spin/wait cycles
|
||||
|
||||
### Request Flow
|
||||
|
||||
**Health Check Pipeline** (/ok endpoint):
|
||||
|
||||
```
|
||||
I/O Threads (8) → HttpHandler::on_batch_complete() → Commit Pipeline
|
||||
↑ ↓
|
||||
|
||||
5
todo.md
5
todo.md
@@ -3,6 +3,7 @@
|
||||
## 📋 Planned Tasks
|
||||
|
||||
### Core Database Features
|
||||
|
||||
- [ ] Design commit pipeline architecture with three-stage processing
|
||||
- [ ] Stage 1: Version assignment and precondition validation thread
|
||||
- [ ] Stage 2: Transaction persistence and subscriber streaming thread
|
||||
@@ -13,6 +14,7 @@
|
||||
- [ ] Design and architect the subscription component for change streams
|
||||
|
||||
### API Endpoints Implementation
|
||||
|
||||
- [ ] Implement `GET /v1/version` endpoint to return latest committed version and leader
|
||||
- [ ] Implement `POST /v1/commit` endpoint for transaction submission with precondition validation
|
||||
- [ ] Implement `GET /v1/status` endpoint for commit request status lookup by request_id
|
||||
@@ -23,6 +25,7 @@
|
||||
- [ ] Implement `DELETE /v1/retention/<policy_id>` endpoint for retention policy removal
|
||||
|
||||
### Infrastructure & Tooling
|
||||
|
||||
- [x] Implement thread-safe Prometheus metrics library and serve `GET /metrics` endpoint
|
||||
- [ ] Implement gperf-based HTTP routing for efficient request dispatching
|
||||
- [ ] Replace nlohmann/json with simdjson DOM API in parser comparison benchmarks
|
||||
@@ -54,6 +57,7 @@
|
||||
- [ ] Implement `DeleteObjects` for batch object deletion
|
||||
|
||||
### Client Libraries
|
||||
|
||||
- [ ] Implement high-level Python client library for WeaselDB REST API
|
||||
- [ ] Wrap `/v1/version`, `/v1/commit`, `/v1/status` endpoints
|
||||
- [ ] Handle `/v1/subscribe` SSE streaming with reconnection logic
|
||||
@@ -64,6 +68,7 @@
|
||||
- [ ] Provide CLI tooling for database administration
|
||||
|
||||
### Testing & Validation
|
||||
|
||||
- [ ] Build out-of-process API test suite using client library over real TCP
|
||||
- [ ] Test all `/v1/version`, `/v1/commit`, `/v1/status` endpoints
|
||||
- [ ] Test `/v1/subscribe` Server-Sent Events streaming
|
||||
|
||||
Reference in New Issue
Block a user