From 0e49025c0e8739abe7bb1a1aa462845dc6c99e5f Mon Sep 17 00:00:00 2001 From: Andrew Noyes Date: Mon, 18 Aug 2025 16:13:08 -0400 Subject: [PATCH] Consolidate config docs in config.md, update file references --- config.md | 72 +++++++++++++++++++++++++++++++++++++++++++++++++++---- design.md | 18 +++++++------- 2 files changed, 75 insertions(+), 15 deletions(-) diff --git a/config.md b/config.md index 428ffeb..b67f293 100644 --- a/config.md +++ b/config.md @@ -14,13 +14,17 @@ By default, WeaselDB looks for `config.toml` in the current directory. You can s ### Server Configuration (`[server]`) -Controls basic server binding and request limits. +Controls server networking, threading, and request handling behavior. | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `bind_address` | string | `"127.0.0.1"` | IP address to bind the server to | | `port` | integer | `8080` | Port number to listen on | | `max_request_size_bytes` | integer | `1048576` (1MB) | Maximum size for incoming requests. Requests exceeding this limit receive a `413 Content Too Large` response | +| `accept_threads` | integer | `1` | Number of dedicated threads for accepting incoming connections | +| `network_threads` | integer | `1` | Number of threads for epoll-based network I/O processing | +| `event_batch_size` | integer | `32` | Number of events to process in each epoll batch | +| `max_connections` | integer | `1000` | Maximum number of concurrent connections (0 = unlimited) | ### Commit Configuration (`[commit]`) @@ -50,6 +54,10 @@ Controls behavior of the `/v1/subscribe` endpoint and SSE streaming. bind_address = "0.0.0.0" port = 8080 max_request_size_bytes = 2097152 # 2MB +accept_threads = 2 +network_threads = 8 +event_batch_size = 64 +max_connections = 10000 [commit] min_request_id_length = 32 @@ -63,15 +71,69 @@ keepalive_interval_seconds = 15 ## Configuration Loading -- If the specified config file doesn't exist or contains errors, WeaselDB will use default values and log a warning -- All configuration parameters are optional - any missing values will use the defaults shown above +WeaselDB uses the `toml11` library for configuration parsing with robust error handling: + +- **File Loading**: Uses `ConfigParser::load_from_file(path)` to parse TOML files +- **Fallback Behavior**: If the config file doesn't exist or contains errors, WeaselDB uses default values and logs a warning +- **Optional Parameters**: All configuration parameters are optional - missing values use the defaults shown above +- **Type Safety**: Configuration values are parsed with type validation and meaningful error messages +- **Validation**: All parameters are validated against reasonable bounds before use ## API Relationship -These configuration parameters directly affect API behavior: +These configuration parameters directly affect server and API behavior: +**Server Performance:** +- **`accept_threads`**: Controls parallelism for accepting new connections. More threads can handle higher connection rates +- **`network_threads`**: Controls I/O processing parallelism. Should typically match CPU core count for optimal performance +- **`event_batch_size`**: Larger batches reduce syscall overhead but may increase latency under light load +- **`max_connections`**: Prevents resource exhaustion by limiting concurrent connections + +**Request Handling:** - **`max_request_size_bytes`**: Determines when `/v1/commit` returns `413 Content Too Large` -- **`min_request_id_length`**: Validates `request_id` fields in `/v1/commit` requests +- **`min_request_id_length`**: Validates `request_id` fields in `/v1/commit` requests for sufficient entropy + +**Request ID Management:** - **`request_id_retention_*`**: Affects availability of data for `/v1/status` queries and likelihood of `log_truncated` responses + +**Subscription Streaming:** - **`max_buffer_size_bytes`**: Controls when `/v1/subscribe` connections are terminated due to slow consumption - **`keepalive_interval_seconds`**: Frequency of keepalive comments in `/v1/subscribe` streams + +## Configuration Validation + +The configuration system includes comprehensive validation with specific bounds checking: + +### Server Configuration Limits +- **`port`**: Must be between 1 and 65535 +- **`max_request_size_bytes`**: Must be > 0 and ≤ 100MB +- **`accept_threads`**: Must be between 1 and 100 +- **`network_threads`**: Must be between 1 and 1000 +- **`event_batch_size`**: Must be between 1 and 10000 +- **`max_connections`**: Must be between 0 and 100000 (0 = unlimited) + +### Commit Configuration Limits +- **`min_request_id_length`**: Must be between 8 and 256 characters +- **`request_id_retention_hours`**: Must be between 1 and 8760 hours (1 year) +- **`request_id_retention_versions`**: Must be > 0 + +### Subscription Configuration Limits +- **`max_buffer_size_bytes`**: Must be > 0 and ≤ 1GB +- **`keepalive_interval_seconds`**: Must be between 1 and 3600 seconds (1 hour) + +### Cross-Validation +- Warns if `max_request_size_bytes` > `max_buffer_size_bytes` (potential buffering issues) + +## Configuration Management + +### Code Integration +- **Configuration Structure**: Defined in `src/config.hpp` with structured types +- **Parser Implementation**: Located in `src/config.cpp` using template-based parsing +- **Default Values**: Embedded as struct defaults for compile-time initialization +- **Runtime Usage**: Configuration passed to server components during initialization + +### Development Guidelines +- **New Parameters**: Add to appropriate struct in `src/config.hpp` +- **Validation**: Include bounds checking in `ConfigParser::validate_config()` +- **Documentation**: Update this file when adding new configuration options +- **Testing**: Verify both valid and invalid configuration scenarios diff --git a/design.md b/design.md index bc00a81..99fc8d6 100644 --- a/design.md +++ b/design.md @@ -78,7 +78,7 @@ Parser capabilities: - Memory-efficient string views backed by arena storage - Perfect hash table lookup for JSON keys using gperf -#### 4. **Parser Interface** (`src/parser_interface.hpp`) +#### 4. **Parser Interface** (`src/commit_request_parser.hpp`) - **Abstract base class** for commit request parsers - **Format-agnostic parsing interface** supporting multiple serialization formats - **Streaming and one-shot parsing modes** @@ -88,12 +88,10 @@ Parser capabilities: - **TOML-based configuration** using `toml11` library - **Structured configuration** with server, commit, and subscription sections - **Default fallback values** for all configuration options -- **Type-safe parsing** with validation +- **Type-safe parsing** with validation and bounds checking +- **Comprehensive validation** with meaningful error messages -Configuration domains: -- **Server**: bind address, port, request size limits -- **Commit**: request ID validation, retention policies -- **Subscription**: buffer management, keepalive intervals +See `config.md` for complete configuration documentation. #### 6. **JSON Token Optimization** (`src/json_tokens.gperf`, `src/json_token_enum.hpp`) - **Perfect hash table** generated by gperf for O(1) JSON key lookup @@ -170,7 +168,7 @@ The project includes comprehensive testing infrastructure: Build targets: - `test_arena_allocator`: Arena allocator functionality tests - `test_commit_request`: JSON parsing and validation tests -- `weaseldb`: Main application demonstrating configuration and parsing +- Main server executable (compiled from `src/main.cpp`) - `bench_arena_allocator`: Arena allocator performance benchmarks - `bench_commit_request`: JSON parsing performance benchmarks - `bench_parser_comparison`: Comparison benchmarks vs nlohmann::json and RapidJSON @@ -214,7 +212,7 @@ The modular design allows each component to be optimized independently while mai - **Build-Generated**: `build/` contains CMake-generated files including `json_tokens.cpp` ### Adding New Parsers -- Inherit from `ParserInterface` in `src/parser_interface.hpp` +- Inherit from `CommitRequestParser` in `src/commit_request_parser.hpp` - Implement both streaming and one-shot parsing modes - Use arena allocation for all temporary string storage - Add corresponding test cases in `tests/` @@ -228,8 +226,8 @@ The modular design allows each component to be optimized independently while mai ### Configuration Management - All configuration is TOML-based using `config.toml` -- Default values are provided in `src/config.cpp` -- Configuration sections: server, commit, subscription +- Comprehensive documentation available in `config.md` +- Type-safe parsing with validation and bounds checking - Always validate configuration values and provide meaningful errors ### Testing Strategy