diff --git a/design.md b/design.md index 30de3ba..bc00a81 100644 --- a/design.md +++ b/design.md @@ -1,9 +1,45 @@ -# WeaselDB Design Overview +# WeaselDB Development Guide ## Project Summary WeaselDB is a high-performance write-side database component designed for systems where reading and writing are decoupled. The system focuses exclusively on handling transactional commits with optimistic concurrency control, while readers are expected to maintain their own queryable representations by subscribing to change streams. +## Quick Start + +### Build System +- Use CMake with C++20 standard +- Primary build commands: + - `mkdir -p build && cd build` + - `cmake .. -DCMAKE_BUILD_TYPE=Release` + - `ninja` or `make -j$(nproc)` + +### Testing and Development Workflow +- **Run all tests**: `ninja test` or `ctest` +- **Individual test targets**: + - `./test_arena_allocator` - Arena allocator unit tests + - `./test_commit_request` - JSON parsing and validation tests +- **Benchmarking**: + - `./bench_arena_allocator` - Memory allocation performance + - `./bench_commit_request` - JSON parsing performance + - `./bench_parser_comparison` - Compare against nlohmann::json and RapidJSON +- **Debug tools**: `./debug_arena` - Analyze arena allocator behavior + +### Code Style and Conventions +- **C++ Style**: Modern C++20 with RAII and move semantics +- **Memory Management**: Prefer arena allocation over standard allocators +- **String Handling**: Use `std::string_view` for zero-copy operations +- **Error Handling**: Return error codes or use exceptions appropriately +- **Naming**: snake_case for variables/functions, PascalCase for classes +- **Performance**: Always consider allocation patterns and cache locality + +### Dependencies and External Libraries +- **weaseljson**: Must be installed system-wide (high-performance JSON parser) +- **simdutf**: Fetched automatically (SIMD base64 encoding/decoding) +- **toml11**: Fetched automatically (TOML configuration parsing) +- **doctest**: Fetched automatically (testing framework) +- **nanobench**: Fetched automatically (benchmarking library) +- **gperf**: System requirement for perfect hash generation + ## Architecture Overview ### Core Components @@ -160,3 +196,76 @@ This write-side component is designed to integrate with: - **Monitoring** systems for operational visibility The modular design allows each component to be optimized independently while maintaining clear interfaces for system integration. + +## Development Guidelines + +### Important Implementation Details +- **Arena Allocator Pattern**: Always use `ArenaAllocator` for temporary allocations within request processing +- **String View Usage**: Prefer `std::string_view` over `std::string` when pointing to arena-allocated memory +- **JSON Token Lookup**: Use the gperf-generated perfect hash table in `json_tokens.hpp` for O(1) key recognition +- **Base64 Handling**: Always use simdutf for base64 encoding/decoding for performance +- **Error Propagation**: Use structured error types that can be efficiently returned up the call stack + +### File Organization +- **Core Headers**: `src/` contains all primary implementation files +- **Tests**: `tests/` contains doctest-based unit tests +- **Benchmarks**: `benchmarks/` contains nanobench performance tests +- **Tools**: `tools/` contains debugging and analysis utilities +- **Build-Generated**: `build/` contains CMake-generated files including `json_tokens.cpp` + +### Adding New Parsers +- Inherit from `ParserInterface` in `src/parser_interface.hpp` +- Implement both streaming and one-shot parsing modes +- Use arena allocation for all temporary string storage +- Add corresponding test cases in `tests/` +- Add benchmark comparisons in `benchmarks/` + +### Performance Considerations +- **Memory**: Arena allocation eliminates fragmentation - use it for all request-scoped data +- **CPU**: Perfect hashing and SIMD operations are critical paths - avoid alternatives +- **I/O**: Streaming parser design supports incremental network data processing +- **Cache**: String views avoid copying, keeping data cache-friendly + +### Configuration Management +- All configuration is TOML-based using `config.toml` +- Default values are provided in `src/config.cpp` +- Configuration sections: server, commit, subscription +- Always validate configuration values and provide meaningful errors + +### Testing Strategy +- **Unit tests** validate individual component correctness +- **Benchmarks** ensure performance characteristics are maintained +- **Debug tools** help analyze memory usage patterns +- Always run both tests and benchmarks before submitting changes + +### Build System Details +- CMake generates gperf hash tables at build time +- Ninja is preferred over make for faster incremental builds +- Release builds include debug symbols for profiling +- All external dependencies except weaseljson are auto-fetched + +## Common Patterns + +### Arena-Based String Handling +```cpp +// Preferred: Zero-copy string view +std::string_view process_json_key(const char* data, ArenaAllocator& arena); + +// Avoid: Unnecessary string copies +std::string process_json_key(const char* data); +``` + +### Error Handling Pattern +```cpp +enum class ParseResult { Success, InvalidJson, MissingField }; +ParseResult parse_commit_request(const char* json, CommitRequest& out); +``` + +### Builder Pattern Usage +```cpp +CommitRequest request = CommitRequestBuilder(arena) + .request_id("example-id") + .leader_id("leader-123") + .read_version(42) + .build(); +```