# WeaselDB C++ Style Guide This document describes the C++ coding style used in the WeaselDB project. These conventions ensure consistency, readability, and maintainability across the codebase. ## Table of Contents 1. [General Principles](#general-principles) 2. [Naming Conventions](#naming-conventions) 3. [File Organization](#file-organization) 4. [Code Structure](#code-structure) 5. [Memory Management](#memory-management) 6. [Error Handling](#error-handling) 7. [Documentation](#documentation) 8. [Testing](#testing) --- ## General Principles ### Language Standard - **C++20** is the target standard - Use modern C++ features: RAII, move semantics, constexpr, concepts where appropriate - Prefer standard library containers and algorithms over custom implementations ### Data Types - **Almost always signed** - prefer `int`, `int64_t`, `size_t` over unsigned types except for: - Bit manipulation operations - Interfacing with APIs that require unsigned types - Memory sizes where overflow is impossible (`size_t`, `uint32_t` for arena block sizes) - Where defined unsigned overflow behavior (wraparound) is intentional and desired - **Almost always auto** - let the compiler deduce types except when: - The type is not obvious from context (prefer explicit for clarity) - Specific type requirements matter (numeric conversions, template parameters) - Interface contracts need explicit types (public APIs, function signatures) - **Prefer uninitialized memory to default initialization** when using before initializing would be an error - Valgrind will catch uninitialized memory usage bugs - Avoid hiding logic errors with unnecessary zero-initialization - Default initialization can mask bugs and hurt performance - **Floating point is for metrics only** - avoid `float`/`double` in core data structures and algorithms - Use for performance measurements, statistics, and monitoring data - Never use for counts, sizes, or business logic ### Performance Focus - **Performance-first design** - optimize for the hot path - **Simple is fast** - find exactly what's necessary, strip away everything else - **Complexity must be justified with benchmarks** - measure performance impact before adding complexity - **Strive for 0% CPU usage when idle** - avoid polling, busy waiting, or unnecessary background activity - Use **inline functions** for performance-critical code (e.g., `allocate_raw`) - **Zero-copy operations** with `std::string_view` over string copying - **Arena allocation** for efficient memory management (see Memory Management section for details) ### Complexity Control - **Encapsulation is the main tool for controlling complexity** - **Header files define the interface** - they are the contract with users of your code - **Headers should be complete** - include everything needed to use the interface effectively: - Usage examples in comments - Preconditions and postconditions - Thread safety guarantees - Performance characteristics - Ownership and lifetime semantics - **Do not rely on undocumented interface properties** - if it's not in the header, don't depend on it --- ## Naming Conventions ### Variables and Functions - **snake_case** for all variables, functions, and member functions ```cpp size_t used_bytes() const; void add_block(size_t size); uint32_t initial_block_size_; ``` ### Structs - **PascalCase** for struct names - **Always use struct** - eliminates debates about complexity and maintains consistency - **Public members first, private after** - puts the interface users care about at the top, implementation details below - **Full encapsulation still applies** - use `private:` sections to hide implementation details and maintain deep, capable classes - The struct keyword doesn't mean shallow design - it means interface-first organization for human readers ```cpp struct ArenaAllocator { // Public interface first explicit ArenaAllocator(size_t initial_size = 1024); void* allocate_raw(size_t size); private: // Private members after uint32_t initial_block_size_; Block* current_block_; }; ``` ### Enums - **PascalCase** for enum class names - **PascalCase** for enum values (not SCREAMING_SNAKE_CASE) ```cpp enum class Type { PointRead, RangeRead }; enum class ParseState { Root, PreconditionsArray, OperationObject }; ``` ### Constants and Macros - **snake_case** for constants - Avoid macros when possible; prefer `constexpr` variables ```cpp static const WeaselJsonCallbacks json_callbacks; ``` ### Member Variables - **Trailing underscore** for private member variables ```cpp private: uint32_t initial_block_size_; Block *current_block_; ``` ### Template Parameters - **PascalCase** for template type parameters ```cpp template template struct rebind { using type = T*; }; ``` --- ## File Organization ### Header Files - Use **`#pragma once`** instead of include guards - **Never `using namespace std`** - always use fully qualified names for clarity and safety - **Include order:** 1. Corresponding header file (for .cpp files) 2. Standard library headers (alphabetical) 3. Third-party library headers 4. Project headers ```cpp #pragma once #include #include #include #include #include #include #include #include "arena_allocator.hpp" #include "commit_request.hpp" // Never this: // using namespace std; // Always this: std::vector data; std::unique_ptr parser; ``` ### Source Files - Include corresponding header first - Follow same include order as headers (see Header Files section above) --- ## Code Structure ### Struct Design - **Move-only semantics** for resource-owning structs - **Explicit constructors** to prevent implicit conversions - **Delete copy operations** when inappropriate ```cpp struct ArenaAllocator { explicit ArenaAllocator(size_t initial_size = 1024); // Copy construction is not allowed ArenaAllocator(const ArenaAllocator &) = delete; ArenaAllocator &operator=(const ArenaAllocator &) = delete; // Move semantics ArenaAllocator(ArenaAllocator &&other) noexcept; ArenaAllocator &operator=(ArenaAllocator &&other) noexcept; private: uint32_t initial_block_size_; Block *current_block_; }; ``` ### Function Design - **Const correctness** - mark methods const when appropriate - **Parameter passing:** - Pass by value for types ≤ 16 bytes (int, pointers, string_view, small structs) - Pass by const reference for types > 16 bytes (containers, large objects) - **Return by value** for small types (≤ 16 bytes), **string_view** for zero-copy over strings - **noexcept specification** for move operations and non-throwing functions ```cpp std::span operations() const { return operations_; } void process_data(std::string_view data); // ≤ 16 bytes, pass by value void process_request(const CommitRequest& req); // > 16 bytes, pass by reference ArenaAllocator(ArenaAllocator &&other) noexcept; ``` ### Template Usage - **Template constraints** using static_assert for better error messages - **SFINAE** or concepts for template specialization ```cpp template T *construct(Args &&...args) { static_assert( std::is_trivially_destructible_v, "ArenaAllocator::construct requires trivially destructible types."); // ... } ``` ### Factory Patterns - **Static factory methods** for complex construction requiring specific initialization - **Friend-based factories** for access control when constructor should be private - **Factory patterns ensure proper ownership semantics** (shared_ptr vs unique_ptr) ```cpp // Static factory method auto server = Server::create(config, handler); // Returns shared_ptr // Friend-based factory for access control struct Connection { void appendMessage(std::string_view data); private: Connection(/* args */); // Private constructor friend struct Server; // Only Server can construct }; ``` ### Control Flow - **Early returns** to reduce nesting - **Range-based for loops** when possible ```cpp if (size == 0) { return nullptr; } for (auto &precondition : preconditions_) { // ... } ``` --- ## Memory Management ### Ownership & Allocation - **Arena allocators** for request-scoped memory with **STL allocator adapters** (provides ~1ns allocation vs ~20-270ns for malloc) - **String views** pointing to arena-allocated memory for zero-copy operations - **Prefer unique_ptr** for exclusive ownership - **shared_ptr only if shared ownership is necessary** - most objects have single owners - **Factory patterns** for complex construction and ownership control (see Code Structure section for factory patterns) - **STL containers with arena allocators require default construction after arena reset** - `clear()` is not sufficient ```cpp // STL containers with arena allocators - correct reset pattern std::vector> operations(arena_alloc); // ... use container ... operations = {}; // Default construct - clear() won't work correctly arena.reset(); // Reset arena memory ``` ### Resource Management - **RAII** everywhere - constructors acquire, destructors release - **Move semantics** for efficient resource transfer - **Explicit cleanup** methods where appropriate ```cpp ~ArenaAllocator() { while (current_block_) { Block *prev = current_block_->prev; std::free(current_block_); current_block_ = prev; } } ``` --- ## Error Handling ### Error Philosophy - **Return codes** for expected errors that can be handled - **Abort for system failures** - If we can't uphold the component's contract, perror/fprintf then abort. If recovery is possible, change the component's contract to allow returning an error code. - **Error messages are for humans only** - never parse error message strings programmatically - **Error codes are the contract** - use enums/codes for programmatic error handling ### Error Boundaries - **Expected errors**: Invalid user input, network timeouts, file not found - return error codes - **System failures**: Memory allocation failure, socket creation failure - abort immediately - **Programming errors**: Assertion failures, null pointer dereference - abort immediately ```cpp enum class ParseResult { Success, InvalidJson, MissingField }; // Good: Test error codes (part of contract) auto result = parser.parse(data); if (result == ParseResult::InvalidJson) { // Handle programmatically } // Bad: Don't test or parse error message strings // CHECK(parser.get_error() == "Expected '}' at line 5"); // BRITTLE! // System resource failures: abort immediately void ArenaAllocator::allocate() { void* memory = malloc(size); if (!memory) { std::fprintf(stderr, "ArenaAllocator: Failed to allocate memory\n"); std::abort(); // Process is likely in bad state } } ``` ### Assertions - Use **assert()** for debug-time checks that validate program correctness - **Static assertions** for compile-time validation - **Standard assert behavior**: Assertions are **enabled by default** and **disabled when `NDEBUG` is defined** - **Use assertions for programming errors**: Null pointer checks, precondition validation, invariant checking - **Don't use assertions for expected runtime errors**: Use return codes for recoverable conditions ```cpp // Good: Programming error checks (enabled by default, disabled with NDEBUG) assert(current_block_ && "realloc called with non-null ptr but no current block"); assert(size > 0 && "Cannot allocate zero bytes"); assert(ptr != nullptr && "Invalid pointer passed to realloc"); // Good: Compile-time validation (always enabled) static_assert(std::is_trivially_destructible_v, "Arena requires trivially destructible types"); // Bad: Don't use assert for expected runtime errors // assert(file_exists(path)); // File might legitimately not exist - use return code instead ``` **Build Configuration:** - **Debug builds**: `cmake -DCMAKE_BUILD_TYPE=Debug` → assertions **enabled** (default behavior) - **Release builds**: `cmake -DCMAKE_BUILD_TYPE=Release` → assertions **disabled** (defines `NDEBUG`) - **Test targets**: Always have assertions **enabled** using `-UNDEBUG` pattern (see Build Integration section) - **Testing**: Test in both debug and release builds to catch assertion failures in all configurations --- ## Documentation ### Doxygen Style - **/** for struct and public method documentation - **@brief** for short descriptions - **@param** and **@return** for function parameters - **@note** for important implementation notes - **@warning** for critical usage warnings ```cpp /** * @brief Type-safe version of realloc_raw for arrays of type T. * @param ptr Pointer to the existing allocation * @param old_size Size in number of T objects * @param new_size Desired new size in number of T objects * @return Pointer to reallocated memory * @note Prints error to stderr and calls std::abort() if allocation fails */ template T *realloc(T *ptr, uint32_t old_size, uint32_t new_size); ``` ### Code Comments - **Explain why, not what** - code should be self-documenting - **Performance notes** for optimization decisions - **Thread safety** and ownership semantics ```cpp // Uses O(1) accumulated counters for fast retrieval size_t total_allocated() const; // Only Server can create connections - no public constructor Connection(struct sockaddr_storage addr, int fd, int64_t id, ConnectionHandler *handler, std::weak_ptr server); ``` --- ## Testing ### Test Framework - **doctest** for unit testing - **TEST_CASE** and **SUBCASE** for test organization - **CHECK** for assertions (non-terminating) - **REQUIRE** for critical assertions (terminating) ### Test Structure - **Descriptive test names** explaining the scenario - **SUBCASE** for related test variations - **Fresh instances** for each test to avoid state contamination ```cpp TEST_CASE("ArenaAllocator basic allocation") { ArenaAllocator arena; SUBCASE("allocate zero bytes returns nullptr") { void *ptr = arena.allocate_raw(0); CHECK(ptr == nullptr); } SUBCASE("allocate single byte") { void *ptr = arena.allocate_raw(1); CHECK(ptr != nullptr); CHECK(arena.used_bytes() >= 1); } } ``` ### Test Design Principles - **Prefer testing through public interfaces** - focus on observable behavior rather than implementation details - **Test the contract, not the implementation** - validate what the API promises to deliver - **Avoid testing private methods directly** - if private functionality needs testing, consider if it should be public or extracted - **Both integration and unit tests** - test components in isolation and working together - **Prefer fakes to mocks** - use real implementations for internal components, fake external dependencies - **Always enable assertions in tests** - use `-UNDEBUG` pattern to ensure assertions are checked (see Build Integration section) ```cpp // Good: Testing through public API TEST_CASE("Server accepts connections") { auto config = Config::defaultConfig(); auto handler = std::make_unique(); auto server = Server::create(config, std::move(handler)); // Test observable behavior - server can accept connections auto result = connectToServer(server->getPort()); CHECK(result.connected); } // Avoid: Testing internal implementation details // TEST_CASE("Server creates epoll instance") { /* implementation detail */ } ``` ### Test Synchronization - **NEVER use timeouts** or sleep-based synchronization - **Deterministic synchronization only:** - Blocking I/O operations - `condition_variable.wait()` (no timeout variant) - `std::latch`, `std::barrier`, futures/promises - RAII guards and resource management ### Multithreading Test Correctness - **Force concurrent execution** - Thread creation takes time, so work often completes sequentially before threads start - **Use std::latch to synchronize thread startup** - Ensures all threads begin racing simultaneously ```cpp // BAD: Race likely over before threads start std::atomic counter{0}; for (int i = 0; i < 4; ++i) { threads.emplace_back([&]() { counter++; }); // Probably sequential } // GOOD: Force threads to race simultaneously std::atomic counter{0}; std::latch start_latch{4}; for (int i = 0; i < 4; ++i) { threads.emplace_back([&]() { start_latch.count_down_and_wait(); // All threads start together counter++; // Now they actually race }); } ``` --- ## Build Integration ### CMake Integration - **Generated code** (gperf hash tables) in build directory - **Ninja** as the preferred generator - **Export compile commands** for tooling support **Build Types:** ```bash # Debug build (assertions enabled by default, optimizations off) cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Debug -DCMAKE_EXPORT_COMPILE_COMMANDS=ON # Release build (assertions disabled, optimizations on) cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Release -DCMAKE_EXPORT_COMPILE_COMMANDS=ON ``` **Testing and Development:** - **Test targets always have assertions enabled** - even in release builds, test targets use `-UNDEBUG` to ensure assertions are checked - **Production builds have assertions disabled** - the main `weaseldb` executable follows standard build type behavior - **Use Release builds for performance testing** and production deployment - **This ensures tests catch assertion failures** regardless of build configuration ### Test Assertion Pattern (-UNDEBUG) **Problem**: Release builds define `NDEBUG` which disables assertions, but tests should always validate assertions to catch programming errors. **Solution**: Use `-UNDEBUG` compiler flag for test targets to undefine `NDEBUG` and re-enable assertions. **CMake Implementation:** ```cmake # Test target with assertions always enabled add_executable(test_example tests/test_example.cpp src/example.cpp) target_link_libraries(test_example doctest::doctest) target_compile_options(test_example PRIVATE -UNDEBUG) # Always enable assertions # Production target follows build type add_executable(example src/example.cpp src/main.cpp) # No -UNDEBUG → assertions disabled in Release, enabled in Debug ``` **Benefits:** - **Consistent test behavior**: Tests validate assertions in both Debug and Release builds - **Production performance**: Production binaries maintain optimized release performance - **Early error detection**: Catch assertion failures during CI/CD regardless of build configuration - **Build type flexibility**: Can use Release builds for performance profiling while still testing assertions ### Code Generation - **gperf** for perfect hash table generation - **Build-time generation** of token lookup tables - **Include generated headers** from build directory --- ## Style Enforcement ### Consistency - Follow existing patterns in the codebase - Use the same style for similar constructs - Maintain consistency within each translation unit ### Tools - **clang-format** configuration (when available) - **Static analysis** tools for code quality - **Address sanitizer** for memory safety testing This style guide reflects the existing codebase patterns and should be followed for all new code contributions to maintain consistency and readability.