Files
weaseldb/style.md

25 KiB

WeaselDB C++ Style Guide

This document describes the C++ coding style used in the WeaselDB project. These conventions ensure consistency, readability, and maintainability across the codebase.

Table of Contents

  1. General Principles
  2. Naming Conventions
  3. File Organization
  4. Code Structure
  5. Memory Management
  6. Error Handling
  7. Documentation
  8. Testing

General Principles

Language Standard

  • C++20 is the target standard
  • Use modern C++ features: RAII, move semantics, constexpr, concepts where appropriate
  • Prefer standard library containers and algorithms over custom implementations

C Library Functions and Headers

  • Always use std:: prefixed versions of C library functions for consistency and clarity
  • Use C++ style headers (<cstring>, <cstdlib>, etc.) instead of C style headers (<string.h>, <stdlib.h>, etc.)
  • This applies to all standard libc functions: std::abort(), std::fprintf(), std::free(), std::memcpy(), std::strlen(), std::strncpy(), std::memset(), std::signal(), etc.
  • Exception: Functions with no std:: equivalent (e.g., perror(), gai_strerror()) and system-specific headers (e.g., <unistd.h>, <fcntl.h>)
// Preferred - C++ style
#include <cstring>
#include <cstdlib>
#include <csignal>
std::abort();
std::fprintf(stderr, "Error message\n");
std::free(ptr);
std::memcpy(dest, src, size);
std::strlen(str);
std::strncpy(dest, src, n);
std::memset(ptr, value, size);
std::signal(SIGTERM, handler);

// Avoid - C style
#include <string.h>
#include <stdlib.h>
#include <signal.h>
abort();
fprintf(stderr, "Error message\n");
free(ptr);
memcpy(dest, src, size);
strlen(str);
strncpy(dest, src, n);
memset(ptr, value, size);
signal(SIGTERM, handler);

Data Types

  • Almost always signed - prefer int, int64_t, ssize_t over unsigned types except for:
    • Bit manipulation operations
    • Interfacing with APIs that require unsigned types
    • Where defined unsigned overflow behavior (wraparound) is intentional and desired
  • Almost always auto - let the compiler deduce types except when:
    • The type is not obvious from context (prefer explicit for clarity)
    • Specific type requirements matter (numeric conversions, template parameters)
    • Interface contracts need explicit types (public APIs, function signatures)
  • Prefer uninitialized memory to default initialization when using before initializing would be an error
    • Valgrind will catch uninitialized memory usage bugs
    • Avoid hiding logic errors with unnecessary zero-initialization
    • Default initialization can mask bugs and hurt performance
  • Floating point is for metrics only - avoid float/double in core data structures and algorithms
    • Use for performance measurements, statistics, and monitoring data
    • Never use for counts, sizes, or business logic

Type Casting

  • Never use C-style casts - they're unsafe and can hide bugs by performing dangerous conversions
  • Use C++ cast operators for explicit type conversions with clear intent and safety checks
  • Avoid reinterpret_cast - almost always indicates poor design; redesign APIs instead
  • Prefer no casts - design APIs and use types that avoid casting entirely when possible
// Dangerous - C-style casts (NEVER DO THIS)
// int* ptr = (int*)malloc(sizeof(int));     // Unsafe
// int64_t id = (int64_t)some_pointer;       // Dangerous pointer conversion
// BaseClass* base = (BaseClass*)derived;    // Loses type safety

// Acceptable C++ cast operators (use sparingly)
auto ptr = static_cast<int*>(malloc(sizeof(int)));    // Explicit conversion
auto base = static_cast<BaseClass*>(derived_ptr);     // Safe upcast
auto derived = dynamic_cast<DerivedClass*>(base_ptr); // Runtime type checking
auto mutable_ptr = const_cast<int*>(const_ptr);       // Remove const (rare)
// reinterpret_cast can be appropriate for low-level operations (very rare)
auto addr = reinterpret_cast<uintptr_t>(ptr);         // Pointer to integer conversion

Performance Focus

  • Performance-first design - optimize for the hot path
  • Simple is fast - find exactly what's necessary, strip away everything else
  • Complexity must be justified with benchmarks - measure performance impact before adding complexity
  • Strive for 0% CPU usage when idle - avoid polling, busy waiting, or unnecessary background activity
  • Use inline functions for performance-critical code (e.g., allocate_raw)
  • String views with std::string_view to minimize unnecessary copying
  • Arena allocation for efficient memory management (~1ns vs ~20-270ns for malloc)

String Formatting

  • Always use format.hpp functions - formats directly into arena-allocated memory
  • Use static_format() for performance-sensitive code - faster but less flexible than format()
  • Use format() function with arena allocator for printf-style formatting
// Most performance-sensitive - compile-time optimized concatenation
std::string_view response = static_format(arena,
    "HTTP/1.1 ", status_code, " OK\r\n",
    "Content-Length: ", body.size(), "\r\n",
    "\r\n", body);

// Printf-style formatting - runtime flexible
Arena& arena = conn.get_arena();
std::string_view response = format(arena,
    "HTTP/1.1 %d OK\r\n"
    "Content-Length: %zu\r\n"
    "\r\n%.*s",
    status_code, body.size(),
    static_cast<int>(body.size()), body.data());

Complexity Control

  • Encapsulation is the main tool for controlling complexity
  • Header files define the interface - they are the contract with users of your code
  • Headers should be complete - include everything needed to use the interface effectively:
    • Usage examples in comments
    • Preconditions and postconditions
    • Thread safety guarantees
    • Performance characteristics
    • Ownership and lifetime semantics
  • Do not rely on undocumented interface properties - if it's not in the header, don't depend on it

Naming Conventions

Variables and Functions

  • snake_case for all variables, functions, and member functions
  • Legacy camelCase exists - the codebase currently contains mixed naming due to historical development. New code should use snake_case. Existing camelCase should be converted to snake_case during natural refactoring (not mass renaming).
int64_t used_bytes() const;
void add_block(int64_t size);
int32_t initial_block_size_;

Classes and Structs

  • PascalCase for class/struct names
  • Always use struct keyword - eliminates debates about complexity and maintains consistency
  • Public members first, private after - puts the interface users care about at the top, implementation details below
  • Full encapsulation still applies - use private: sections to hide implementation details and maintain deep, capable structs
  • The struct keyword doesn't mean shallow design - it means interface-first organization for human readers
struct Arena {
  // Public interface first
  explicit Arena(int64_t initial_size = 1024);
  void* allocate_raw(int64_t size);

private:
  // Private members after
  int32_t initial_block_size_;
  Block* current_block_;
};

Enums

  • PascalCase for enum class names
  • PascalCase for enum values (not SCREAMING_SNAKE_CASE)
enum class Type {
  PointRead,
  RangeRead
};

enum class ParseState {
  Root,
  PreconditionsArray,
  OperationObject
};

Constants and Macros

  • snake_case for constants
  • Avoid macros when possible; prefer constexpr variables
static const WeaselJsonCallbacks json_callbacks;

Member Variables

  • Trailing underscore for private member variables
private:
  int32_t initial_block_size_;
  Block *current_block_;

Template Parameters

  • PascalCase for template type parameters
template <typename T, typename... Args>
template <typename T> struct rebind { using type = T*; };

File Organization

Include Organization

  • Use #pragma once instead of include guards
  • Never using namespace std - always use fully qualified names for clarity and safety
  • Include order (applies to both headers and source files):
    1. Corresponding header file (for .cpp files only)
    2. Standard library headers (alphabetical)
    3. Third-party library headers
    4. Project headers
#pragma once

#include <algorithm>
#include <cstddef>
#include <cstdint>
#include <memory>
#include <string_view>

#include <simdutf.h>
#include <weaseljson/weaseljson.h>

#include "arena.hpp"
#include "commit_request.hpp"

// Never this:
// using namespace std;

// Always this:
std::vector<int> data;
std::unique_ptr<Parser> parser;

Code Structure

Class Design

  • Move-only semantics for resource-owning types
  • Explicit constructors to prevent implicit conversions
  • Delete copy operations when inappropriate
struct Arena {
  explicit Arena(int64_t initial_size = 1024);

  // Copy construction is not allowed
  Arena(const Arena &source) = delete;
  Arena &operator=(const Arena &source) = delete;

  // Move semantics
  Arena(Arena &&source) noexcept;
  Arena &operator=(Arena &&source) noexcept;

private:
  int32_t initial_block_size_;
  Block *current_block_;
};

Function Design

  • Const correctness - mark methods const when appropriate
  • Parameter passing:
    • Pass by value for types ≤ 16 bytes (int, pointers, string_view, small structs)
    • Pass by const reference for types > 16 bytes (containers, large objects)
  • Return by value for small types (≤ 16 bytes), string_view to avoid copying strings
  • noexcept specification for move operations and non-throwing functions
std::span<const Operation> operations() const { return operations_; }
void process_data(std::string_view request_data);  // ≤ 16 bytes, pass by value
void process_request(const CommitRequest& commit_request);  // > 16 bytes, pass by reference
Arena(Arena &&source) noexcept;

Template Usage

  • Template constraints using static_assert for better error messages
  • SFINAE or concepts for template specialization

Factory Patterns & Ownership

  • Static factory methods for complex construction requiring shared ownership
  • Friend-based factories for access control when constructor should be private
  • Ownership guidelines:
    • unique_ptr for exclusive ownership (most common case)
    • Ref only when object logically has multiple owners (Ref is our custom std::shared_ptr variant)
    • Factory methods return appropriate smart pointer type based on ownership needs
// Shared ownership - multiple components need concurrent access
auto server = Server::create(config, handler);  // Returns Ref<Server>

// Exclusive ownership - single owner, transfer via move
auto connection = Connection::createForServer(addr, fd, connection_id, handler, server_ref);

// Friend-based factory for access control
struct Connection {
  void append_message(std::string_view message_data);
private:
  Connection(struct sockaddr_storage client_addr, int file_descriptor,
             int64_t connection_id, ConnectionHandler* request_handler,
             std::weak_ptr<Server> server_ref);
  friend struct Server;   // Only Server can construct
};

Control Flow

  • Early returns to reduce nesting
  • Range-based for loops when possible
if (size == 0) {
  return nullptr;
}

for (auto &precondition : preconditions_) {
  // ...
}

Atomic Operations

  • Never use assignment operators with std::atomic - always use explicit store() and load()
  • Always specify memory ordering explicitly for atomic operations
  • Use the least restrictive correct memory ordering - choose the weakest ordering that maintains correctness
// Preferred - explicit store/load with precise memory ordering
std::atomic<uint64_t> counter;
counter.store(42, std::memory_order_relaxed);      // Single-writer metric updates
auto value = counter.load(std::memory_order_relaxed); // Reading metrics for display

counter.store(1, std::memory_order_release);       // Publishing initialization
auto ready = counter.load(std::memory_order_acquire); // Synchronizing with publisher

counter.store(42, std::memory_order_seq_cst);      // When sequential consistency needed

// Avoid - assignment operators (implicit memory ordering)
std::atomic<uint64_t> counter;
counter = 42;        // Implicit - memory ordering not explicit
auto value = counter; // Implicit - memory ordering not explicit

Memory Management

Ownership & Allocation

  • Arena allocators for request-scoped memory with STL allocator adapters (see Performance Focus section for characteristics)
  • String views pointing to arena-allocated memory to avoid unnecessary copying
  • STL containers with arena allocators require default construction after arena reset - clear() is not sufficient
// STL containers with arena allocators - correct reset pattern
std::vector<Operation, ArenaStlAllocator<Operation>> operations(arena);
// ... use container ...
operations = {};  // Default construct - clear() won't work correctly
arena.reset();  // Reset arena memory

Resource Management

  • RAII everywhere - constructors acquire, destructors release
  • Move semantics for efficient resource transfer
  • Explicit cleanup methods where appropriate
~Arena() {
  while (current_block_) {
    Block *prev = current_block_->prev;
    std::free(current_block_);
    current_block_ = prev;
  }
}

Error Handling

Error Classification & Response

  • Expected errors (invalid input, timeouts): Return error codes for programmatic handling
  • System failures (malloc fail, socket fail): Abort immediately with error message
  • Programming errors (precondition violations, assertions): Abort immediately

Error Contract Design

  • Error codes are the API contract - use enums for programmatic decisions
  • Error messages are human-readable only - never parse message strings
  • Consistent error boundaries - each component defines what it can/cannot recover from
  • Interface precondition violations are undefined behavior - acceptable to skip checks for performance in hot paths
  • Error code types must be nodiscard - mark error code enums with [[nodiscard]] to prevent silent failures
enum class [[nodiscard]] ParseResult { Success, InvalidJson, MissingField };

// System failure - abort immediately
void* memory = std::malloc(size);
if (!memory) {
    std::fprintf(stderr, "Arena: Memory allocation failed\n");
    std::abort();
}
// ... use memory, eventually std::free(memory)

// Programming error - precondition violation (may be omitted for performance)
assert(ptr != nullptr && "Precondition violated: pointer must be non-null");

Assertions

  • Programming error detection using standard assert() macro
  • Assertion behavior follows C++ standards:
    • Debug builds: Assertions active (undefined NDEBUG)
    • Release builds: Assertions removed (defined NDEBUG)
  • Test targets override: Use -UNDEBUG to force assertions active in all builds
  • Static assertions for compile-time validation (always active)

Usage guidelines:

  • Use for programming errors: null checks, precondition validation, invariants
  • Don't use for expected runtime errors: use return codes instead
// Good: Programming error checks
assert(current_block_ && "realloc called with non-null ptr but no current block");
assert(size > 0 && "Cannot allocate zero bytes");

// Good: Compile-time validation (always enabled)
static_assert(std::is_trivially_destructible_v<T>, "Arena requires trivially destructible types");

// Bad: Don't use assert for expected runtime errors
// assert(file_exists(path));  // File might legitimately not exist - use return code instead

System Call Error Handling

When a system call is interrupted by a signal (EINTR), it is usually necessary to retry the call. This is especially true for "slow" system calls that can block for a long time, such as read, write, accept, connect, sem_wait, and epoll_wait.

Rule: Always wrap potentially interruptible system calls in a do-while loop that checks for EINTR.

Example:

int fd;
do {
  fd = accept(listen_fd, nullptr, nullptr);
} while (fd == -1 && errno == EINTR);

if (fd == -1) {
  // Handle other errors (perror has no std:: equivalent)
  perror("accept");
  std::abort();
}

Special case - close():

The close() system call is a special case on Linux. According to man 2 close, when close() returns EINTR on Linux, the file descriptor is still guaranteed to be closed. Therefore, close() should never be retried.

// Correct: Do not retry close() on EINTR
int result = close(fd);
if (result == -1 && errno != EINTR) {
  // Handle non-EINTR errors only (perror has no std:: equivalent)
  perror("close");
  std::abort();
}
// Note: fd is guaranteed closed even on EINTR

Non-interruptible calls:

Most system calls are not interruptible in practice. For these, it is not necessary to add a retry loop. This includes:

  • fcntl (with F_GETFL, F_SETFL, F_GETFD, F_SETFD - note: F_SETLKW and F_OFD_SETLKW CAN return EINTR)
  • epoll_ctl
  • socketpair
  • pipe
  • setsockopt
  • epoll_create1
  • close (special case: guaranteed closed even on EINTR on Linux)

When in doubt, consult the man page for the specific system call to see if it can return EINTR.


Documentation

Doxygen Style

  • / for struct and public method documentation
  • @brief for short descriptions
  • @param and @return for function parameters
  • @note for important implementation notes
  • @warning for critical usage warnings
/**
 * @brief Type-safe version of realloc_raw for arrays of type T.
 * @param existing_ptr Pointer to the existing allocation
 * @param current_size Size in number of T objects
 * @param requested_size Desired new size in number of T objects
 * @return Pointer to reallocated memory
 * @note Prints error to stderr and calls std::abort() if allocation fails
 */
template <typename T>
T *realloc(T *existing_ptr, int32_t current_size, int32_t requested_size);

Code Comments

  • Explain why, not what - code should be self-documenting
  • Performance notes for optimization decisions
  • Thread safety and ownership semantics
// Uses O(1) accumulated counters for fast retrieval
int64_t total_allocated() const;

// Only Server can create connections - no public constructor
Connection(struct sockaddr_storage addr, int fd, int64_t id,
           ConnectionHandler *handler, std::weak_ptr<Server> server);

Testing

Test Framework

  • doctest for unit testing
  • TEST_CASE and SUBCASE for test organization
  • CHECK for assertions (non-terminating)
  • REQUIRE for critical assertions (terminating)

Test Structure

  • Descriptive test names explaining the scenario
  • SUBCASE for related test variations
  • Fresh instances for each test to avoid state contamination
TEST_CASE("Arena basic allocation") {
  Arena arena;

  SUBCASE("allocate zero bytes returns nullptr") {
    void *ptr = arena.allocate_raw(0);
    CHECK(ptr == nullptr);
  }

  SUBCASE("allocate single byte") {
    void *ptr = arena.allocate_raw(1);
    CHECK(ptr != nullptr);
    CHECK(arena.used_bytes() >= 1);
  }
}

Test Design Principles

  • Test the contract, not the implementation - validate what the API promises to deliver, not implementation details
  • Both integration and unit tests - test components in isolation and working together
  • Prefer fakes to mocks - use real implementations for internal components, fake external dependencies
  • Always enable assertions in tests - use -UNDEBUG pattern to ensure assertions are checked (see Build Integration section)
// Good: Testing through public API
TEST_CASE("Server accepts connections") {
  auto config = Config::defaultConfig();
  auto handler = std::make_unique<TestHandler>();
  auto server = Server::create(config, std::move(handler));

  // Test observable behavior - server can accept connections
  auto result = connectToServer(server->getPort());
  CHECK(result.connected);
}

// Avoid: Testing internal implementation details
// TEST_CASE("Server creates epoll instance") { /* implementation detail */ }

What NOT to Test

Avoid testing language features and plumbing:

  • Don't test that virtual functions dispatch correctly
  • Don't test that standard library types work (unique_ptr, containers, etc.)
  • Don't test basic constructor/destructor calls

Test business logic instead:

  • When does your code call hooks/callbacks and why?
  • What state transitions trigger behavior changes?
  • How does your code handle error conditions?
  • What promises does your API make to users?

Ask: "Am I testing the C++ compiler or my application logic?"

Test Synchronization (Authoritative Rules)

  • ABSOLUTELY NEVER use timeouts (sleep_for, wait_for, etc.)
  • Deterministic synchronization only:
    • Blocking I/O (naturally waits for completion)
    • condition_variable.wait() without timeout
    • std::latch, std::barrier, futures/promises
  • Force concurrent execution using std::latch to synchronize thread startup

Threading Checklist for Tests/Benchmarks

Common threading principles (all concurrent code):

  • Count total threads - Include main/benchmark thread in count
  • Always assume concurrent execution needed - Tests/benchmarks require real concurrency
  • Add synchronization primitive - std::latch start_latch{N} (most common), std::barrier, or similar where N = total concurrent threads
  • Each thread synchronizes before doing work - e.g., start_latch.arrive_and_wait() or barrier.arrive_and_wait()
  • Main thread synchronizes before measurement/execution - ensures all threads start simultaneously

Test-specific:

  • Perform many operations per thread creation - amortize thread creation cost and increase chances of hitting race conditions
  • Pattern: Create test that spawns threads and runs many operations, then run that test many times - amortizes thread creation cost while providing fresh test instances
  • Run 100-10000 operations per test, and 100-10000 test iterations - maximizes chances of hitting race conditions
  • Always run with ThreadSanitizer - compile with -fsanitize=thread

Benchmark-specific:

  • NEVER create threads inside the benchmark measurement - creates thread creation/destruction overhead, not contention
  • Create background threads OUTSIDE the benchmark that run continuously during measurement
  • Use std::atomic<bool> keep_running to cleanly shut down background threads after benchmark
  • Measure only the foreground operation under real contention from background threads

Red flags to catch immediately:

  • Creating threads in a loop without std::latch
  • Background threads starting work immediately
  • Benchmark measuring before all threads synchronized
  • Any use of sleep_for, wait_for, or timeouts

Simple rule: Multiple threads = std::latch synchronization. No exceptions, even for "simple" background threads.

// BAD: Race likely over before threads start
int counter = 0;
for (int i = 0; i < 4; ++i) {
  threads.emplace_back([&]() { counter++; }); // Probably sequential
}

// GOOD: Force threads to race simultaneously
int counter = 0;
std::latch start_latch{4};
for (int i = 0; i < 4; ++i) {
  threads.emplace_back([&]() {
    start_latch.count_down_and_wait(); // All threads start together
    counter++; // Now they actually race (data race on non-atomic)
  });
}

Build Integration

Build Configuration

# Debug: assertions on, optimizations off
cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Debug -DCMAKE_EXPORT_COMPILE_COMMANDS=ON

# Release: assertions off, optimizations on
cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Release -DCMAKE_EXPORT_COMPILE_COMMANDS=ON

Test Target Pattern:

  • Production targets follow build type (assertions off in Release)
  • Test targets use -UNDEBUG to force assertions on in all builds
  • Ensures consistent test validation regardless of build type
# Test target with assertions always enabled
add_executable(test_example tests/test_example.cpp src/example.cpp)
target_link_libraries(test_example doctest::doctest)
target_compile_options(test_example PRIVATE -UNDEBUG)  # Always enable assertions

# Production target follows build type
add_executable(example src/example.cpp src/main.cpp)
# No -UNDEBUG → assertions disabled in Release, enabled in Debug

Code Generation

  • Generated files go in build directory, not source