Files
weaseldb/style.md
2025-08-23 12:56:28 -04:00

15 KiB

WeaselDB C++ Style Guide

This document describes the C++ coding style used in the WeaselDB project. These conventions ensure consistency, readability, and maintainability across the codebase.

Table of Contents

  1. General Principles
  2. Naming Conventions
  3. File Organization
  4. Code Structure
  5. Memory Management
  6. Error Handling
  7. Documentation
  8. Testing

General Principles

Language Standard

  • C++20 is the target standard
  • Use modern C++ features: RAII, move semantics, constexpr, concepts where appropriate
  • Prefer standard library containers and algorithms over custom implementations

Data Types

  • Almost always signed - prefer int, int64_t, size_t over unsigned types except for:
    • Bit manipulation operations
    • Interfacing with APIs that require unsigned types
    • Memory sizes where overflow is impossible (size_t, uint32_t for arena block sizes)
    • Where defined unsigned overflow behavior (wraparound) is intentional and desired
  • Almost always auto - let the compiler deduce types except when:
    • The type is not obvious from context (prefer explicit for clarity)
    • Specific type requirements matter (numeric conversions, template parameters)
    • Interface contracts need explicit types (public APIs, function signatures)
  • Prefer uninitialized memory to default initialization when using before initializing would be an error
    • Valgrind will catch uninitialized memory usage bugs
    • Avoid hiding logic errors with unnecessary zero-initialization
    • Default initialization can mask bugs and hurt performance
  • Floating point is for metrics only - avoid float/double in core data structures and algorithms
    • Use for performance measurements, statistics, and monitoring data
    • Never use for counts, sizes, or business logic

Performance Focus

  • Performance-first design - optimize for the hot path
  • Simple is fast - find exactly what's necessary, strip away everything else
  • Complexity must be justified with benchmarks - measure performance impact before adding complexity
  • Strive for 0% CPU usage when idle - avoid polling, busy waiting, or unnecessary background activity
  • Use inline functions for performance-critical code (e.g., allocate_raw)
  • Zero-copy operations with std::string_view over string copying
  • Arena allocation for efficient memory management

Complexity Control

  • Encapsulation is the main tool for controlling complexity
  • Header files define the interface - they are the contract with users of your code
  • Headers should be complete - include everything needed to use the interface effectively:
    • Usage examples in comments
    • Preconditions and postconditions
    • Thread safety guarantees
    • Performance characteristics
    • Ownership and lifetime semantics
  • Do not rely on undocumented interface properties - if it's not in the header, don't depend on it

Naming Conventions

Variables and Functions

  • snake_case for all variables, functions, and member functions
size_t used_bytes() const;
void add_block(size_t size);
uint32_t initial_block_size_;

Structs

  • PascalCase for struct names
  • Always use struct - eliminates debates about complexity and maintains consistency
  • Public members first, private after - puts the interface users care about at the top, implementation details below
  • Full encapsulation still applies - use private: sections to hide implementation details and maintain deep, capable classes
  • The struct keyword doesn't mean shallow design - it means interface-first organization for human readers
struct ArenaAllocator {
  // Public interface first
  explicit ArenaAllocator(size_t initial_size = 1024);
  void* allocate_raw(size_t size);

private:
  // Private members after
  uint32_t initial_block_size_;
  Block* current_block_;
};

Enums

  • PascalCase for enum class names
  • PascalCase for enum values (not SCREAMING_SNAKE_CASE)
enum class Type {
  PointRead,
  RangeRead
};

enum class ParseState {
  Root,
  PreconditionsArray,
  OperationObject
};

Constants and Macros

  • snake_case for constants
  • Avoid macros when possible; prefer constexpr variables
static const WeaselJsonCallbacks json_callbacks;

Member Variables

  • Trailing underscore for private member variables
private:
  uint32_t initial_block_size_;
  Block *current_block_;

Template Parameters

  • PascalCase for template type parameters
template <typename T, typename... Args>
template <typename U> struct rebind {};

File Organization

Header Files

  • Use #pragma once instead of include guards
  • Never using namespace std - always use fully qualified names for clarity and safety
  • Include order:
    1. Corresponding header file (for .cpp files)
    2. Standard library headers (alphabetical)
    3. Third-party library headers
    4. Project headers
#pragma once

#include <algorithm>
#include <cstddef>
#include <cstdint>
#include <memory>
#include <string_view>

#include <simdutf.h>
#include <weaseljson/weaseljson.h>

#include "arena_allocator.hpp"
#include "commit_request.hpp"

// Never this:
// using namespace std;

// Always this:
std::vector<int> data;
std::unique_ptr<Parser> parser;

Source Files

  • Include corresponding header first
  • Follow same include order as headers
#include "json_commit_request_parser.hpp"

#include <cassert>
#include <charconv>
#include <cstring>

#include <simdutf.h>

#include "json_token_enum.hpp"

Code Structure

Struct Design

  • Move-only semantics for resource-owning structs
  • Explicit constructors to prevent implicit conversions
  • Delete copy operations when inappropriate
struct ArenaAllocator {
  explicit ArenaAllocator(size_t initial_size = 1024);

  // Copy construction is not allowed
  ArenaAllocator(const ArenaAllocator &) = delete;
  ArenaAllocator &operator=(const ArenaAllocator &) = delete;

  // Move semantics
  ArenaAllocator(ArenaAllocator &&other) noexcept;
  ArenaAllocator &operator=(ArenaAllocator &&other) noexcept;

private:
  uint32_t initial_block_size_;
  Block *current_block_;
};

Function Design

  • Const correctness - mark methods const when appropriate
  • Parameter passing:
    • Pass by value for types ≤ 16 bytes (int, pointers, string_view, small structs)
    • Pass by const reference for types > 16 bytes (containers, large objects)
  • Return by value for small types (≤ 16 bytes), string_view for zero-copy over strings
  • noexcept specification for move operations and non-throwing functions
std::span<const Operation> operations() const { return operations_; }
void process_data(std::string_view data);  // ≤ 16 bytes, pass by value
void process_request(const CommitRequest& req);  // > 16 bytes, pass by reference
ArenaAllocator(ArenaAllocator &&other) noexcept;

Template Usage

  • Template constraints using static_assert for better error messages
  • SFINAE or concepts for template specialization
template <typename T> T *construct(Args &&...args) {
  static_assert(
      std::is_trivially_destructible_v<T>,
      "ArenaAllocator::construct requires trivially destructible types.");
  // ...
}

Control Flow

  • Early returns to reduce nesting
  • Range-based for loops when possible
if (size == 0) {
  return nullptr;
}

for (auto &precondition : preconditions_) {
  // ...
}

Memory Management

Ownership & Allocation

  • Arena allocators for request-scoped memory with STL allocator adapters
  • String views pointing to arena-allocated memory for zero-copy
  • Prefer unique_ptr for exclusive ownership
  • shared_ptr only if shared ownership is necessary - most objects have single owners
  • Factory patterns for complex construction and ownership control
  • STL containers with arena allocators require default construction after arena reset - clear() is not sufficient
// Static factory methods for complex objects requiring specific initialization
auto server = Server::create(config, handler);  // Ensures shared_ptr semantics
Block *block = Block::create(size, prev);       // Custom allocation + setup

// Friend-based factories for access control
struct Connection {
  // Public interface first
  void appendMessage(std::string_view data);
  bool writeBytes();

private:
  Connection(/* args */);  // Private constructor
  friend struct Server;   // Only Server can construct Connections
};

// Usage in Server
auto conn = std::unique_ptr<Connection>(new Connection(args));

// STL containers with arena allocators - correct reset pattern
std::vector<Operation, ArenaStlAllocator<Operation>> operations(arena_alloc);
// ... use container ...
operations = {};  // Default construct - clear() won't work correctly
arena.reset();  // Reset arena memory

Resource Management

  • RAII everywhere - constructors acquire, destructors release
  • Move semantics for efficient resource transfer
  • Explicit cleanup methods where appropriate
~ArenaAllocator() {
  while (current_block_) {
    Block *prev = current_block_->prev;
    std::free(current_block_);
    current_block_ = prev;
  }
}

Error Handling

Error Reporting

  • Return codes for expected errors
  • Avoid exceptions - If we can't uphold the component's contract, perror/fprintf then abort. If we want to try to recover, change the component's contract to allow returning an error code.
  • Error messages are for humans only - never parse error message strings programmatically
  • Error codes are the contract - use enums/codes for programmatic error handling
enum class ParseResult { Success, InvalidJson, MissingField };

// Good: Test error codes (part of contract)
auto result = parser.parse(data);
if (result == ParseResult::InvalidJson) {
  // Handle programmatically
}

// Bad: Don't test or parse error message strings
// CHECK(parser.get_error() == "Expected '}' at line 5");  // BRITTLE!

// System resource failures: abort immediately
void ArenaAllocator::allocate() {
  void* memory = malloc(size);
  if (!memory) {
    perror("malloc");
    std::abort();  // Process is likely in bad state
  }
}

Assertions

  • Use assert() for debug-time checks
  • Static assertions for compile-time validation
assert(current_block_ && "realloc called with non-null ptr but no current block");
static_assert(std::is_trivially_destructible_v<T>, "Arena requires trivially destructible types");

Documentation

Doxygen Style

  • / for struct and public method documentation
  • @brief for short descriptions
  • @param and @return for function parameters
  • @note for important implementation notes
  • @warning for critical usage warnings
/**
 * @brief Type-safe version of realloc_raw for arrays of type T.
 * @param ptr Pointer to the existing allocation
 * @param old_size Size in number of T objects
 * @param new_size Desired new size in number of T objects
 * @return Pointer to reallocated memory
 * @note Prints error to stderr and calls std::abort() if allocation fails
 */
template <typename T>
T *realloc(T *ptr, uint32_t old_size, uint32_t new_size);

Code Comments

  • Explain why, not what - code should be self-documenting
  • Performance notes for optimization decisions
  • Thread safety and ownership semantics
// Uses O(1) accumulated counters for fast retrieval
size_t total_allocated() const;

// Only Server can create connections - no public constructor
Connection(struct sockaddr_storage addr, int fd, int64_t id,
           ConnectionHandler *handler, std::weak_ptr<Server> server);

Testing

Test Framework

  • doctest for unit testing
  • TEST_CASE and SUBCASE for test organization
  • CHECK for assertions (non-terminating)
  • REQUIRE for critical assertions (terminating)

Test Structure

  • Descriptive test names explaining the scenario
  • SUBCASE for related test variations
  • Fresh instances for each test to avoid state contamination
TEST_CASE("ArenaAllocator basic allocation") {
  ArenaAllocator arena;

  SUBCASE("allocate zero bytes returns nullptr") {
    void *ptr = arena.allocate_raw(0);
    CHECK(ptr == nullptr);
  }

  SUBCASE("allocate single byte") {
    void *ptr = arena.allocate_raw(1);
    CHECK(ptr != nullptr);
    CHECK(arena.used_bytes() >= 1);
  }
}

Test Design Principles

  • Prefer testing through public interfaces - focus on observable behavior rather than implementation details
  • Test the contract, not the implementation - validate what the API promises to deliver
  • Avoid testing private methods directly - if private functionality needs testing, consider if it should be public or extracted
  • Both integration and unit tests - test components in isolation and working together
  • Prefer fakes to mocks - use real implementations for internal components, fake external dependencies
// Good: Testing through public API
TEST_CASE("Server accepts connections") {
  auto config = Config::defaultConfig();
  auto handler = std::make_unique<TestHandler>();
  auto server = Server::create(config, std::move(handler));

  // Test observable behavior - server can accept connections
  auto result = connectToServer(server->getPort());
  CHECK(result.connected);
}

// Avoid: Testing internal implementation details
// TEST_CASE("Server creates epoll instance") { /* implementation detail */ }

Test Synchronization

  • NEVER use timeouts or sleep-based synchronization
  • Deterministic synchronization only:
    • Blocking I/O operations
    • condition_variable.wait() (no timeout variant)
    • std::latch, std::barrier, futures/promises
    • RAII guards and resource management

Multithreading Test Correctness

  • Force concurrent execution - Thread creation takes time, so work often completes sequentially before threads start
  • Use std::latch to synchronize thread startup - Ensures all threads begin racing simultaneously
// BAD: Race likely over before threads start
std::atomic<int> counter{0};
for (int i = 0; i < 4; ++i) {
  threads.emplace_back([&]() { counter++; }); // Probably sequential
}

// GOOD: Force threads to race simultaneously
std::atomic<int> counter{0};
std::latch start_latch{4};
for (int i = 0; i < 4; ++i) {
  threads.emplace_back([&]() {
    start_latch.count_down_and_wait(); // All threads start together
    counter++; // Now they actually race
  });
}

Build Integration

CMake Integration

  • Generated code (gperf hash tables) in build directory
  • Ninja as the preferred generator
  • Export compile commands for tooling support
cmake .. -G Ninja -DCMAKE_EXPORT_COMPILE_COMMANDS=ON

Code Generation

  • gperf for perfect hash table generation
  • Build-time generation of token lookup tables
  • Include generated headers from build directory

Style Enforcement

Consistency

  • Follow existing patterns in the codebase
  • Use the same style for similar constructs
  • Maintain consistency within each translation unit

Tools

  • clang-format configuration (when available)
  • Static analysis tools for code quality
  • Address sanitizer for memory safety testing

This style guide reflects the existing codebase patterns and should be followed for all new code contributions to maintain consistency and readability.