Files
weaseldb/style.md

19 KiB

WeaselDB C++ Style Guide

This document describes the C++ coding style used in the WeaselDB project. These conventions ensure consistency, readability, and maintainability across the codebase.

Table of Contents

  1. General Principles
  2. Naming Conventions
  3. File Organization
  4. Code Structure
  5. Memory Management
  6. Error Handling
  7. Documentation
  8. Testing

General Principles

Language Standard

  • C++20 is the target standard
  • Use modern C++ features: RAII, move semantics, constexpr, concepts where appropriate
  • Prefer standard library containers and algorithms over custom implementations

Data Types

  • Almost always signed - prefer int, int64_t, size_t over unsigned types except for:
    • Bit manipulation operations
    • Interfacing with APIs that require unsigned types
    • Memory sizes where overflow is impossible (size_t, uint32_t for arena block sizes)
    • Where defined unsigned overflow behavior (wraparound) is intentional and desired
  • Almost always auto - let the compiler deduce types except when:
    • The type is not obvious from context (prefer explicit for clarity)
    • Specific type requirements matter (numeric conversions, template parameters)
    • Interface contracts need explicit types (public APIs, function signatures)
  • Prefer uninitialized memory to default initialization when using before initializing would be an error
    • Valgrind will catch uninitialized memory usage bugs
    • Avoid hiding logic errors with unnecessary zero-initialization
    • Default initialization can mask bugs and hurt performance
  • Floating point is for metrics only - avoid float/double in core data structures and algorithms
    • Use for performance measurements, statistics, and monitoring data
    • Never use for counts, sizes, or business logic

Performance Focus

  • Performance-first design - optimize for the hot path
  • Simple is fast - find exactly what's necessary, strip away everything else
  • Complexity must be justified with benchmarks - measure performance impact before adding complexity
  • Strive for 0% CPU usage when idle - avoid polling, busy waiting, or unnecessary background activity
  • Use inline functions for performance-critical code (e.g., allocate_raw)
  • Zero-copy operations with std::string_view over string copying
  • Arena allocation for efficient memory management (see Memory Management section for details)

Complexity Control

  • Encapsulation is the main tool for controlling complexity
  • Header files define the interface - they are the contract with users of your code
  • Headers should be complete - include everything needed to use the interface effectively:
    • Usage examples in comments
    • Preconditions and postconditions
    • Thread safety guarantees
    • Performance characteristics
    • Ownership and lifetime semantics
  • Do not rely on undocumented interface properties - if it's not in the header, don't depend on it

Naming Conventions

Variables and Functions

  • snake_case for all variables, functions, and member functions
size_t used_bytes() const;
void add_block(size_t size);
uint32_t initial_block_size_;

Structs

  • PascalCase for struct names
  • Always use struct - eliminates debates about complexity and maintains consistency
  • Public members first, private after - puts the interface users care about at the top, implementation details below
  • Full encapsulation still applies - use private: sections to hide implementation details and maintain deep, capable classes
  • The struct keyword doesn't mean shallow design - it means interface-first organization for human readers
struct ArenaAllocator {
  // Public interface first
  explicit ArenaAllocator(size_t initial_size = 1024);
  void* allocate_raw(size_t size);

private:
  // Private members after
  uint32_t initial_block_size_;
  Block* current_block_;
};

Enums

  • PascalCase for enum class names
  • PascalCase for enum values (not SCREAMING_SNAKE_CASE)
enum class Type {
  PointRead,
  RangeRead
};

enum class ParseState {
  Root,
  PreconditionsArray,
  OperationObject
};

Constants and Macros

  • snake_case for constants
  • Avoid macros when possible; prefer constexpr variables
static const WeaselJsonCallbacks json_callbacks;

Member Variables

  • Trailing underscore for private member variables
private:
  uint32_t initial_block_size_;
  Block *current_block_;

Template Parameters

  • PascalCase for template type parameters
template <typename T, typename... Args>
template <typename T> struct rebind { using type = T*; };

File Organization

Header Files

  • Use #pragma once instead of include guards
  • Never using namespace std - always use fully qualified names for clarity and safety
  • Include order:
    1. Corresponding header file (for .cpp files)
    2. Standard library headers (alphabetical)
    3. Third-party library headers
    4. Project headers
#pragma once

#include <algorithm>
#include <cstddef>
#include <cstdint>
#include <memory>
#include <string_view>

#include <simdutf.h>
#include <weaseljson/weaseljson.h>

#include "arena_allocator.hpp"
#include "commit_request.hpp"

// Never this:
// using namespace std;

// Always this:
std::vector<int> data;
std::unique_ptr<Parser> parser;

Source Files

  • Include corresponding header first
  • Follow same include order as headers (see Header Files section above)

Code Structure

Struct Design

  • Move-only semantics for resource-owning structs
  • Explicit constructors to prevent implicit conversions
  • Delete copy operations when inappropriate
struct ArenaAllocator {
  explicit ArenaAllocator(size_t initial_size = 1024);

  // Copy construction is not allowed
  ArenaAllocator(const ArenaAllocator &) = delete;
  ArenaAllocator &operator=(const ArenaAllocator &) = delete;

  // Move semantics
  ArenaAllocator(ArenaAllocator &&other) noexcept;
  ArenaAllocator &operator=(ArenaAllocator &&other) noexcept;

private:
  uint32_t initial_block_size_;
  Block *current_block_;
};

Function Design

  • Const correctness - mark methods const when appropriate
  • Parameter passing:
    • Pass by value for types ≤ 16 bytes (int, pointers, string_view, small structs)
    • Pass by const reference for types > 16 bytes (containers, large objects)
  • Return by value for small types (≤ 16 bytes), string_view for zero-copy over strings
  • noexcept specification for move operations and non-throwing functions
std::span<const Operation> operations() const { return operations_; }
void process_data(std::string_view data);  // ≤ 16 bytes, pass by value
void process_request(const CommitRequest& req);  // > 16 bytes, pass by reference
ArenaAllocator(ArenaAllocator &&other) noexcept;

Template Usage

  • Template constraints using static_assert for better error messages
  • SFINAE or concepts for template specialization
template <typename T> T *construct(Args &&...args) {
  static_assert(
      std::is_trivially_destructible_v<T>,
      "ArenaAllocator::construct requires trivially destructible types.");
  // ...
}

Factory Patterns

  • Static factory methods for complex construction requiring specific initialization
  • Friend-based factories for access control when constructor should be private
  • Factory patterns ensure proper ownership semantics (shared_ptr vs unique_ptr)
// Static factory method
auto server = Server::create(config, handler);  // Returns shared_ptr

// Friend-based factory for access control
struct Connection {
  void appendMessage(std::string_view data);
private:
  Connection(/* args */);  // Private constructor
  friend struct Server;   // Only Server can construct
};

Control Flow

  • Early returns to reduce nesting
  • Range-based for loops when possible
if (size == 0) {
  return nullptr;
}

for (auto &precondition : preconditions_) {
  // ...
}

Memory Management

Ownership & Allocation

  • Arena allocators for request-scoped memory with STL allocator adapters (provides ~1ns allocation vs ~20-270ns for malloc)
  • String views pointing to arena-allocated memory for zero-copy operations
  • Prefer unique_ptr for exclusive ownership
  • shared_ptr only if shared ownership is necessary - most objects have single owners
  • Factory patterns for complex construction and ownership control (see Code Structure section for factory patterns)
  • STL containers with arena allocators require default construction after arena reset - clear() is not sufficient

// STL containers with arena allocators - correct reset pattern
std::vector<Operation, ArenaStlAllocator<Operation>> operations(arena_alloc);
// ... use container ...
operations = {};  // Default construct - clear() won't work correctly
arena.reset();  // Reset arena memory

Resource Management

  • RAII everywhere - constructors acquire, destructors release
  • Move semantics for efficient resource transfer
  • Explicit cleanup methods where appropriate
~ArenaAllocator() {
  while (current_block_) {
    Block *prev = current_block_->prev;
    std::free(current_block_);
    current_block_ = prev;
  }
}

Error Handling

Error Philosophy

  • Return codes for expected errors that can be handled
  • Abort for system failures - If we can't uphold the component's contract, perror/fprintf then abort. If recovery is possible, change the component's contract to allow returning an error code.
  • Error messages are for humans only - never parse error message strings programmatically
  • Error codes are the contract - use enums/codes for programmatic error handling

Error Boundaries

  • Expected errors: Invalid user input, network timeouts, file not found - return error codes
  • System failures: Memory allocation failure, socket creation failure - abort immediately
  • Programming errors: Assertion failures, null pointer dereference - abort immediately
enum class ParseResult { Success, InvalidJson, MissingField };

// Good: Test error codes (part of contract)
auto result = parser.parse(data);
if (result == ParseResult::InvalidJson) {
  // Handle programmatically
}

// Bad: Don't test or parse error message strings
// CHECK(parser.get_error() == "Expected '}' at line 5");  // BRITTLE!

// System resource failures: abort immediately
void ArenaAllocator::allocate() {
  void* memory = malloc(size);
  if (!memory) {
    std::fprintf(stderr, "ArenaAllocator: Failed to allocate memory\n");
    std::abort();  // Process is likely in bad state
  }
}

Assertions

  • Use assert() for debug-time checks that validate program correctness
  • Static assertions for compile-time validation
  • Standard assert behavior: Assertions are enabled by default and disabled when NDEBUG is defined
  • Use assertions for programming errors: Null pointer checks, precondition validation, invariant checking
  • Don't use assertions for expected runtime errors: Use return codes for recoverable conditions
// Good: Programming error checks (enabled by default, disabled with NDEBUG)
assert(current_block_ && "realloc called with non-null ptr but no current block");
assert(size > 0 && "Cannot allocate zero bytes");
assert(ptr != nullptr && "Invalid pointer passed to realloc");

// Good: Compile-time validation (always enabled)
static_assert(std::is_trivially_destructible_v<T>, "Arena requires trivially destructible types");

// Bad: Don't use assert for expected runtime errors
// assert(file_exists(path));  // File might legitimately not exist - use return code instead

Build Configuration:

  • Debug builds: cmake -DCMAKE_BUILD_TYPE=Debug → assertions enabled (default behavior)
  • Release builds: cmake -DCMAKE_BUILD_TYPE=Release → assertions disabled (defines NDEBUG)
  • Test targets: Always have assertions enabled using -UNDEBUG pattern (see Build Integration section)
  • Testing: Test in both debug and release builds to catch assertion failures in all configurations

Documentation

Doxygen Style

  • / for struct and public method documentation
  • @brief for short descriptions
  • @param and @return for function parameters
  • @note for important implementation notes
  • @warning for critical usage warnings
/**
 * @brief Type-safe version of realloc_raw for arrays of type T.
 * @param ptr Pointer to the existing allocation
 * @param old_size Size in number of T objects
 * @param new_size Desired new size in number of T objects
 * @return Pointer to reallocated memory
 * @note Prints error to stderr and calls std::abort() if allocation fails
 */
template <typename T>
T *realloc(T *ptr, uint32_t old_size, uint32_t new_size);

Code Comments

  • Explain why, not what - code should be self-documenting
  • Performance notes for optimization decisions
  • Thread safety and ownership semantics
// Uses O(1) accumulated counters for fast retrieval
size_t total_allocated() const;

// Only Server can create connections - no public constructor
Connection(struct sockaddr_storage addr, int fd, int64_t id,
           ConnectionHandler *handler, std::weak_ptr<Server> server);

Testing

Test Framework

  • doctest for unit testing
  • TEST_CASE and SUBCASE for test organization
  • CHECK for assertions (non-terminating)
  • REQUIRE for critical assertions (terminating)

Test Structure

  • Descriptive test names explaining the scenario
  • SUBCASE for related test variations
  • Fresh instances for each test to avoid state contamination
TEST_CASE("ArenaAllocator basic allocation") {
  ArenaAllocator arena;

  SUBCASE("allocate zero bytes returns nullptr") {
    void *ptr = arena.allocate_raw(0);
    CHECK(ptr == nullptr);
  }

  SUBCASE("allocate single byte") {
    void *ptr = arena.allocate_raw(1);
    CHECK(ptr != nullptr);
    CHECK(arena.used_bytes() >= 1);
  }
}

Test Design Principles

  • Prefer testing through public interfaces - focus on observable behavior rather than implementation details
  • Test the contract, not the implementation - validate what the API promises to deliver
  • Avoid testing private methods directly - if private functionality needs testing, consider if it should be public or extracted
  • Both integration and unit tests - test components in isolation and working together
  • Prefer fakes to mocks - use real implementations for internal components, fake external dependencies
  • Always enable assertions in tests - use -UNDEBUG pattern to ensure assertions are checked (see Build Integration section)
// Good: Testing through public API
TEST_CASE("Server accepts connections") {
  auto config = Config::defaultConfig();
  auto handler = std::make_unique<TestHandler>();
  auto server = Server::create(config, std::move(handler));

  // Test observable behavior - server can accept connections
  auto result = connectToServer(server->getPort());
  CHECK(result.connected);
}

// Avoid: Testing internal implementation details
// TEST_CASE("Server creates epoll instance") { /* implementation detail */ }

Test Synchronization

  • NEVER use timeouts or sleep-based synchronization
  • Deterministic synchronization only:
    • Blocking I/O operations
    • condition_variable.wait() (no timeout variant)
    • std::latch, std::barrier, futures/promises
    • RAII guards and resource management

Multithreading Test Correctness

  • Force concurrent execution - Thread creation takes time, so work often completes sequentially before threads start
  • Use std::latch to synchronize thread startup - Ensures all threads begin racing simultaneously
// BAD: Race likely over before threads start
std::atomic<int> counter{0};
for (int i = 0; i < 4; ++i) {
  threads.emplace_back([&]() { counter++; }); // Probably sequential
}

// GOOD: Force threads to race simultaneously
std::atomic<int> counter{0};
std::latch start_latch{4};
for (int i = 0; i < 4; ++i) {
  threads.emplace_back([&]() {
    start_latch.count_down_and_wait(); // All threads start together
    counter++; // Now they actually race
  });
}

Build Integration

CMake Integration

  • Generated code (gperf hash tables) in build directory
  • Ninja as the preferred generator
  • Export compile commands for tooling support

Build Types:

# Debug build (assertions enabled by default, optimizations off)
cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Debug -DCMAKE_EXPORT_COMPILE_COMMANDS=ON

# Release build (assertions disabled, optimizations on)
cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Release -DCMAKE_EXPORT_COMPILE_COMMANDS=ON

Testing and Development:

  • Test targets always have assertions enabled - even in release builds, test targets use -UNDEBUG to ensure assertions are checked
  • Production builds have assertions disabled - the main weaseldb executable follows standard build type behavior
  • Use Release builds for performance testing and production deployment
  • This ensures tests catch assertion failures regardless of build configuration

Test Assertion Pattern (-UNDEBUG)

Problem: Release builds define NDEBUG which disables assertions, but tests should always validate assertions to catch programming errors.

Solution: Use -UNDEBUG compiler flag for test targets to undefine NDEBUG and re-enable assertions.

CMake Implementation:

# Test target with assertions always enabled
add_executable(test_example tests/test_example.cpp src/example.cpp)
target_link_libraries(test_example doctest::doctest)
target_compile_options(test_example PRIVATE -UNDEBUG)  # Always enable assertions

# Production target follows build type
add_executable(example src/example.cpp src/main.cpp)
# No -UNDEBUG → assertions disabled in Release, enabled in Debug

Benefits:

  • Consistent test behavior: Tests validate assertions in both Debug and Release builds
  • Production performance: Production binaries maintain optimized release performance
  • Early error detection: Catch assertion failures during CI/CD regardless of build configuration
  • Build type flexibility: Can use Release builds for performance profiling while still testing assertions

Code Generation

  • gperf for perfect hash table generation
  • Build-time generation of token lookup tables
  • Include generated headers from build directory

Style Enforcement

Consistency

  • Follow existing patterns in the codebase
  • Use the same style for similar constructs
  • Maintain consistency within each translation unit

Tools

  • clang-format configuration (when available)
  • Static analysis tools for code quality
  • Address sanitizer for memory safety testing

This style guide reflects the existing codebase patterns and should be followed for all new code contributions to maintain consistency and readability.