# WeaselDB C++ Style Guide This document describes the C++ coding style used in the WeaselDB project. These conventions ensure consistency, readability, and maintainability across the codebase. ## Table of Contents 1. [General Principles](#general-principles) 1. [Naming Conventions](#naming-conventions) 1. [File Organization](#file-organization) 1. [Code Structure](#code-structure) 1. [Memory Management](#memory-management) 1. [Error Handling](#error-handling) 1. [Documentation](#documentation) 1. [Testing](#testing) ______________________________________________________________________ ## General Principles ### Language Standard - **C++20** is the target standard - Use modern C++ features: RAII, move semantics, constexpr, concepts where appropriate ### C Library Functions and Headers - **Always use std:: prefixed versions** of C library functions for consistency and clarity - **Use C++ style headers** (``, ``, etc.) instead of C style headers (``, ``, etc.) - This applies to all standard libc functions: `std::abort()`, `std::fprintf()`, `std::free()`, `std::memcpy()`, `std::strlen()`, `std::strncpy()`, `std::memset()`, `std::signal()`, etc. - **Exception:** Functions with no std:: equivalent (e.g., `perror()`, `gai_strerror()`) and system-specific headers (e.g., ``, ``) ```cpp // Preferred - C++ style #include #include #include std::abort(); std::fprintf(stderr, "Error message\n"); std::free(ptr); std::memcpy(dest, src, size); std::strlen(str); std::strncpy(dest, src, n); std::memset(ptr, value, size); std::signal(SIGTERM, handler); // Avoid - C style #include #include #include abort(); fprintf(stderr, "Error message\n"); free(ptr); memcpy(dest, src, size); strlen(str); strncpy(dest, src, n); memset(ptr, value, size); signal(SIGTERM, handler); ``` ### Data Types - **Almost always signed** - prefer `int`, `int64_t`, `ssize_t` over unsigned types except for: - Bit manipulation operations - Interfacing with APIs that require unsigned types - Where defined unsigned overflow behavior (wraparound) is intentional and desired - **Almost always auto** - let the compiler deduce types except when: - The type is not obvious from context and the exact type is important (prefer explicit for clarity) - Specific type requirements matter (numeric conversions, template parameters) - Interface contracts need explicit types (public APIs, function signatures) - **Prefer uninitialized memory to default initialization** when using before initializing would be an error - Valgrind will catch uninitialized memory usage bugs - Avoid hiding logic errors that Valgrind would have caught with unnecessary zero-initialization - Default initialization can mask bugs and hurt performance - **Floating point is for metrics only** - avoid `float`/`double` in core data structures and algorithms - Use for performance measurements, statistics, and monitoring data - Avoid branching on the values of floats ### Type Casting - **Never use C-style casts** - they're unsafe and can hide bugs by performing dangerous conversions - **Use C++ cast operators** for explicit type conversions with clear intent and safety checks - **Avoid `reinterpret_cast`** - almost always indicates poor design; redesign APIs instead - **Prefer no casts** - design APIs and use types that avoid casting entirely when possible ```cpp // Dangerous - C-style casts (NEVER DO THIS) // int* ptr = (int*)malloc(sizeof(int)); // Unsafe // int64_t id = (int64_t)some_pointer; // Dangerous pointer conversion // BaseClass* base = (BaseClass*)derived; // Loses type safety // Acceptable C++ cast operators (use sparingly) auto ptr = static_cast(malloc(sizeof(int))); // Explicit conversion auto base = static_cast(derived_ptr); // Safe upcast auto derived = dynamic_cast(base_ptr); // Runtime type checking auto mutable_ptr = const_cast(const_ptr); // Remove const (rare) // reinterpret_cast can be appropriate for low-level operations (very rare) auto addr = reinterpret_cast(ptr); // Pointer to integer conversion ``` ### Performance Focus - **Performance-first design** - optimize for the hot path - **Simple is fast** - find exactly what's necessary, strip away everything else - **Complexity must be justified with benchmarks** - measure performance impact before adding complexity - **Strive for 0% CPU usage when idle** - avoid polling, busy waiting, or unnecessary background activity - Use **inline functions** for performance-critical code (e.g., `allocate_raw`) - **String views** with `std::string_view` to minimize unnecessary copying - **Arena allocation** for efficient memory management, and to group related lifetimes together for simplicity ### String Formatting - **Always use `format.hpp` functions** - formats directly into arena-allocated memory - **Use `static_format()` for performance-sensitive code** - faster but less flexible than `format()` - **Use `format()` function with arena allocator** for printf-style formatting ```cpp // Most performance-sensitive - compile-time optimized concatenation std::string_view response = static_format(arena, "HTTP/1.1 ", status_code, " OK\r\n", "Content-Length: ", body.size(), "\r\n", "\r\n", body); // Printf-style formatting - runtime flexible Arena& arena = conn.get_arena(); std::string_view response = format(arena, "HTTP/1.1 %d OK\r\n" "Content-Length: %zu\r\n" "\r\n%.*s", status_code, body.size(), static_cast(body.size()), body.data()); ``` - Offer APIs that let you avoid concatenating strings if possible - e.g. if the bytes are going to get written to a file descriptor you can skip concatenating and use scatter/gather writev-type calls. ### Complexity Control - **Encapsulation is the main tool for controlling complexity** - **Header files define the interface** - they are the contract with users of your code - **Headers should be complete** - include everything needed to use the interface effectively: - Usage examples in comments - Preconditions and postconditions - Thread safety guarantees - Performance characteristics - Ownership and lifetime semantics - **Do not rely on undocumented properties of an interface** - if it's not in the header, don't depend on it ______________________________________________________________________ ## Naming Conventions ### Variables and Functions - **snake_case** for all variables, functions, and member functions - **Legacy camelCase exists** - the codebase currently contains mixed naming due to historical development. New code should use snake_case. Existing camelCase should be converted to snake_case during natural refactoring (not mass renaming). ```cpp int64_t used_bytes() const; void add_block(int64_t size); int32_t initial_block_size_; ``` ### Classes and Structs - **PascalCase** for class/struct names - **Always use struct keyword** - eliminates debates about complexity and maintains consistency - **Public members first, private after** - puts the interface users care about at the top, implementation details below - **Full encapsulation still applies** - use `private:` sections to hide implementation details and maintain deep, capable structs - The struct keyword doesn't mean shallow design - it means interface-first organization for human readers - Omit the `public` keyword when inheriting from a struct. It's public by default. E.g. `struct A : B {};` instead of `struct A : public B {};` ```cpp struct MyClass { // Public interface first void do_thing(); private: // Private members after int thing_count_; }; ``` ### Enums - **PascalCase** for enum class names - **PascalCase** for enum values (not SCREAMING_SNAKE_CASE) - C-style enums are acceptable where implicit int conversion is desirable, like for bitflags ```cpp enum class Type { PointRead, RangeRead }; enum class ParseState { Root, PreconditionsArray, OperationObject }; ``` ### Constants and Macros - **snake_case** for constants - Avoid macros when possible; prefer `constexpr` variables ```cpp static const WeaselJsonCallbacks json_callbacks; ``` ### Member Variables - **Trailing underscore** for private member variables ```cpp private: int32_t initial_block_size_; Block *current_block_; ``` ### Template Parameters - **PascalCase** for template type parameters ```cpp template template struct rebind { using type = T*; }; ``` ______________________________________________________________________ ## File Organization ### Include Organization - Use **`#pragma once`** instead of include guards - **Never `using namespace std`** - always use fully qualified names for clarity and safety - **Include order** (applies to both headers and source files): 1. Corresponding header file (for .cpp files only) 1. Standard library headers (alphabetical) 1. Third-party library headers 1. Project headers ```cpp #pragma once #include #include #include #include #include #include #include #include "arena.hpp" #include "commit_request.hpp" // Never this: // using namespace std; // Always this: std::vector data; std::unique_ptr parser; ``` ______________________________________________________________________ ## Code Structure ### Class Design - **Move-only semantics** for resource-owning types - **Explicit constructors** to prevent implicit conversions - **Delete copy operations** when copying is inappropriate or should be discouraged ```cpp struct Arena { explicit Arena(int64_t initial_size = 1024); // Copy construction is not allowed Arena(const Arena &source) = delete; Arena &operator=(const Arena &source) = delete; // Move semantics Arena(Arena &&source) noexcept; Arena &operator=(Arena &&source) noexcept; private: int32_t initial_block_size_; Block *current_block_; }; ``` ### Function Design - **Const correctness** - mark methods const when appropriate - **Parameter passing:** - Pass by value for types ≤ 16 bytes (int, pointers, string_view, small structs) - Pass by const reference for types > 16 bytes (containers, large objects) - **Return by value** for small types (≤ 16 bytes), **string_view** to avoid copying strings - **noexcept specification** for move operations and non-throwing functions ```cpp std::span operations() const { return operations_; } void process_data(std::string_view request_data); // ≤ 16 bytes, pass by value void process_request(const CommitRequest& commit_request); // > 16 bytes, pass by reference Arena(Arena &&source) noexcept; ``` ### Template Usage - **Template constraints** using static_assert for better error messages - **SFINAE** or concepts for template specialization ### Factory Patterns & Ownership - **Static factory methods** for complex construction requirements like enforcing shared ownership - **Friend-based factories** for access control when constructor should be private - **Ownership guidelines:** - **unique_ptr** for exclusive ownership (most common case) - **Ref** only when object logically has multiple owners (`Ref` is our custom std::shared_ptr variant) - **Factory methods return appropriate smart pointer type** based on ownership needs ```cpp // Shared ownership - multiple components need concurrent access auto server = Server::create(config, handler); // Returns Ref // Exclusive ownership - single owner, transfer via move auto connection = Connection::createForServer(addr, fd, connection_id, handler, server_ref); // Friend-based factory for access control struct Connection { WeakRef get_weak_ref() const; private: Connection(struct sockaddr_storage client_addr, int file_descriptor, int64_t connection_id, ConnectionHandler* request_handler, std::weak_ptr server_ref); friend struct Server; // Only Server can construct }; ``` ### Control Flow - **Early returns** to reduce nesting - **Range-based for loops** when possible ```cpp if (size == 0) { return nullptr; } for (auto &precondition : preconditions_) { // ... } ``` ### Atomic Operations - **Never use assignment operators** with `std::atomic` - always use explicit `store()` and `load()` - **Always specify memory ordering** explicitly for atomic operations - **Use the least restrictive correct memory ordering** - choose the weakest ordering that maintains correctness ```cpp // Preferred - explicit store/load with precise memory ordering std::atomic counter; counter.store(42, std::memory_order_relaxed); // Single-writer metric updates auto value = counter.load(std::memory_order_relaxed); // Reading metrics for display counter.store(1, std::memory_order_release); // Publishing initialization auto ready = counter.load(std::memory_order_acquire); // Synchronizing with publisher counter.store(42, std::memory_order_seq_cst); // When sequential consistency needed // Avoid - assignment operators (implicit memory ordering) std::atomic counter; counter = 42; // Implicit - memory ordering not explicit auto value = counter; // Implicit - memory ordering not explicit ``` ______________________________________________________________________ ## Memory Management ### Ownership & Allocation - **Arena** for request-scoped memory with **STL allocator adapters** - **String views** pointing to arena-allocated memory to avoid unnecessary copying - **STL containers with arena allocators require default construction after arena reset** - `clear()` is not sufficient ```cpp // STL containers with arena allocators - correct reset pattern std::vector> operations(arena); // ... use container ... operations = {}; // Default construct - clear() won't work correctly arena.reset(); // Reset arena memory ``` ### Resource Management - **RAII** everywhere - constructors acquire, destructors release - **Move semantics** for efficient resource transfer - **Explicit cleanup** methods where appropriate ```cpp ~Arena() { while (current_block_) { Block *prev = current_block_->prev; std::free(current_block_); current_block_ = prev; } } ``` ______________________________________________________________________ ## Error Handling ### Error Classification & Response - **Expected errors** (invalid input, timeouts): Return error codes for programmatic handling - **System failures** (malloc fail, socket fail): Abort immediately with error message - **Programming errors** (precondition violations, assertions): Abort immediately ### Error Contract Design - **Error codes are the API contract** - use enums for programmatic decisions - **Error messages are human-readable only** - never parse message strings - **Consistent error boundaries** - each component defines what it can/cannot recover from - **Interface precondition violations are undefined behavior** - it's acceptable to skip checks for performance in hot paths - **Error code types must be nodiscard** - mark error code enums with `[[nodiscard]]` to prevent silent failures ```cpp enum class [[nodiscard]] ParseResult { Success, InvalidJson, MissingField }; // System failure - abort immediately void* memory = std::malloc(size); if (!memory) { std::fprintf(stderr, "Arena: Memory allocation failed\n"); std::abort(); } // ... use memory, eventually std::free(memory) // Programming error - precondition violation (gets compiled out in release builds) assert(ptr != nullptr && "Precondition violated: pointer must be non-null"); ``` ### Assertions - **Programming error detection** using standard `assert()` macro - **Assertion behavior follows C++ standards:** - **Debug builds**: Assertions active (undefined `NDEBUG`) - **Release builds**: Assertions removed (defined `NDEBUG`) - **Test targets override**: Use `-UNDEBUG` to force assertions active in all builds - **Static assertions** for compile-time validation (always active) **Usage guidelines:** - Use for programming errors: null checks, precondition validation, invariants - Don't use for expected runtime errors: use return codes instead ```cpp // Good: Programming error checks assert(current_block_ && "realloc called with non-null ptr but no current block"); assert(size > 0 && "Cannot allocate zero bytes"); // Good: Compile-time validation (always enabled) static_assert(std::is_trivially_destructible_v, "Arena requires trivially destructible types"); // Bad: Don't use assert for expected runtime errors // assert(file_exists(path)); // File might legitimately not exist - use return code instead ``` ### System Call Error Handling When a system call is interrupted by a signal (`EINTR`), it is usually necessary to retry the call. This is especially true for "slow" system calls that can block for a long time, such as `read`, `write`, `accept`, `connect`, `sem_wait`, and `epoll_wait`. **Rule:** Always wrap potentially interruptible system calls in a `do-while` loop that checks for `EINTR`. **Example:** ```cpp int fd; do { fd = accept(listen_fd, nullptr, nullptr); } while (fd == -1 && errno == EINTR); if (fd == -1) { // Handle other errors (perror has no std:: equivalent) perror("accept"); std::abort(); } ``` **Special case - close():** The `close()` system call is a special case on Linux. According to `man 2 close`, when `close()` returns `EINTR` on Linux, the file descriptor is still guaranteed to be closed. Therefore, `close()` should **never** be retried. ```cpp // Correct: Do not retry close() on EINTR int result = close(fd); if (result == -1 && errno != EINTR) { // Handle non-EINTR errors only (perror has no std:: equivalent) perror("close"); std::abort(); } // Note: fd is guaranteed closed even on EINTR ``` **Non-interruptible calls:** Most system calls are not interruptible in practice. For these, it is not necessary to add a retry loop. This includes: - `fcntl` (with `F_GETFL`, `F_SETFL`, `F_GETFD`, `F_SETFD` - note: `F_SETLKW` and `F_OFD_SETLKW` CAN return EINTR) - `epoll_ctl` - `socketpair` - `pipe` - `setsockopt` - `epoll_create1` - `close` (special case: guaranteed closed even on EINTR on Linux) When in doubt, consult the `man` page for the specific system call to see if it can return `EINTR`. ______________________________________________________________________ ## Documentation ### Doxygen Style - **/** for struct and public method documentation - **@brief** for short descriptions - **@param** and **@return** for function parameters - **@note** for important implementation notes - **@warning** for critical usage warnings ```cpp /** * @brief Type-safe version of realloc_raw for arrays of type T. * @param existing_ptr Pointer to the existing allocation * @param current_size Size in number of T objects * @param requested_size Desired new size in number of T objects * @return Pointer to reallocated memory * @note Prints error to stderr and calls std::abort() if allocation fails */ template T *realloc(T *existing_ptr, int32_t current_size, int32_t requested_size); ``` ### Code Comments - **Explain why, not what** - *what* the code does should be clear without any comments - **Performance notes** for optimization decisions - **Thread safety** and ownership semantics ```cpp // Uses O(1) accumulated counters for fast retrieval int64_t total_allocated() const; // Only Server can create connections - no public constructor Connection(struct sockaddr_storage addr, int fd, int64_t id, ConnectionHandler *handler, std::weak_ptr server); ``` ______________________________________________________________________ ## Testing ### Test Framework - **doctest** for unit testing - **TEST_CASE** and **SUBCASE** for test organization - **CHECK** for assertions (non-terminating) - **REQUIRE** for critical assertions (terminating) ### Test Structure - **Descriptive test names** explaining the scenario - **SUBCASE** for related test variations that share setup/teardown code - **Fresh instances** for each test to avoid state contamination ```cpp TEST_CASE("Arena basic allocation") { Arena arena; SUBCASE("allocate zero bytes returns nullptr") { void *ptr = arena.allocate_raw(0); CHECK(ptr == nullptr); } SUBCASE("allocate single byte") { void *ptr = arena.allocate_raw(1); CHECK(ptr != nullptr); CHECK(arena.used_bytes() >= 1); } } ``` ### Test Design Principles - **Test the contract, not the implementation** - validate what the API promises to deliver, not implementation details - **Both integration and unit tests** - test components in isolation and working together - **Prefer fakes to mocks** - use real implementations for internal components, fake external dependencies - **Always enable assertions in tests** - use `-UNDEBUG` pattern to ensure assertions are checked (see Build Integration section) TODO make a new example here using APIs that exist ```cpp ``` ### What NOT to Test **Avoid testing language features:** - Don't test that virtual functions dispatch correctly - Don't test that standard library types work (unique_ptr, containers, etc.) - Don't test basic constructor/destructor calls **Test business logic instead:** - When does your code call hooks/callbacks and why? - What state transitions trigger behavior changes? - How does your code handle error conditions? - What promises does your API make to users? **Ask: "Am I testing the C++ compiler or my application logic?"** ### Test Synchronization (Authoritative Rules) - **ABSOLUTELY NEVER use timeouts** (`sleep_for`, `wait_for`, etc.) - **Deterministic synchronization only:** - Blocking I/O (naturally waits for completion) - `condition_variable.wait()` without timeout - `std::latch`, `std::barrier`, futures/promises - **Force concurrent execution** using `std::latch` to synchronize thread startup #### Threading Checklist for Tests/Benchmarks **Common threading principles (all concurrent code):** - **Count total threads** - Include main/benchmark thread in count - **Always assume concurrent execution needed** - Tests/benchmarks require real concurrency - **Add synchronization primitive** - `std::latch start_latch{N}` (most common), `std::barrier`, or similar where N = total concurrent threads - **Each thread synchronizes before doing work** - e.g., `start_latch.arrive_and_wait()` or `barrier.arrive_and_wait()` - **Main thread synchronizes before measurement/execution** - ensures all threads start simultaneously **Test-specific:** - **Perform many operations per thread creation** - amortize thread creation cost and increase chances of hitting race conditions - **Pattern: Create test that spawns threads and runs many operations, then run that test many times** - amortizes thread creation cost while providing fresh test instances - **Run 100-10000 operations per test, and 100-10000 test iterations** - maximizes chances of hitting race conditions - **Always run with ThreadSanitizer** - compile with `-fsanitize=thread` **Benchmark-specific:** - **NEVER create threads inside the benchmark measurement** - creates thread creation/destruction overhead, not contention - **Create background threads OUTSIDE the benchmark** that run continuously during measurement - **Use `std::atomic keep_running` to cleanly shut down background threads after benchmark** - **Measure only the foreground operation under real contention from background threads** **Red flags to catch immediately:** - ❌ Creating threads in a loop without `std::latch` - ❌ Background threads starting work immediately - ❌ Benchmark measuring before all threads synchronized - ❌ Any use of `sleep_for`, `wait_for`, or timeouts **Simple rule:** Multiple threads = `std::latch` synchronization. No exceptions, even for "simple" background threads. ```cpp // BAD: Race likely over before threads start int counter = 0; for (int i = 0; i < 4; ++i) { threads.emplace_back([&]() { counter++; }); // Probably sequential } // GOOD: Force threads to race simultaneously int counter = 0; std::latch start_latch{4}; for (int i = 0; i < 4; ++i) { threads.emplace_back([&]() { start_latch.count_down_and_wait(); // All threads start together counter++; // Now they actually race (data race on non-atomic) }); } ``` ______________________________________________________________________ ## Build Integration ### Build Configuration ```bash # Debug: assertions on, optimizations off cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Debug -DCMAKE_EXPORT_COMPILE_COMMANDS=ON # Release: assertions off, optimizations on cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Release -DCMAKE_EXPORT_COMPILE_COMMANDS=ON ``` **Test Target Pattern:** - Production targets follow build type (assertions off in Release) - Test targets use `-UNDEBUG` to force assertions on in all builds - Ensures consistent test validation regardless of build type ```cmake # Test target with assertions always enabled add_executable(test_example tests/test_example.cpp src/example.cpp) target_link_libraries(test_example doctest_impl) target_compile_options(test_example PRIVATE -UNDEBUG) # Always enable assertions add_test(NAME test_example COMMAND test_example) # Production target follows build type add_executable(example src/example.cpp src/main.cpp) # No -UNDEBUG → assertions disabled in Release, enabled in Debug ``` ### Code Generation - Generated files go in build directory, not source