Files
weaseldb/style.md
2025-08-22 21:52:34 -04:00

12 KiB

WeaselDB C++ Style Guide

This document describes the C++ coding style used in the WeaselDB project. These conventions ensure consistency, readability, and maintainability across the codebase.

Table of Contents

  1. General Principles
  2. Naming Conventions
  3. File Organization
  4. Code Structure
  5. Memory Management
  6. Error Handling
  7. Documentation
  8. Testing

General Principles

Language Standard

  • C++20 is the target standard
  • Use modern C++ features: RAII, move semantics, constexpr, concepts where appropriate
  • Prefer standard library containers and algorithms over custom implementations

Data Types

  • Almost always signed - prefer int, int64_t, size_t over unsigned types except for:
    • Bit manipulation operations
    • Interfacing with APIs that require unsigned types
    • Memory sizes where overflow is impossible (size_t, uint32_t for arena block sizes)
    • Where defined unsigned overflow behavior (wraparound) is intentional and desired
  • Almost always auto - let the compiler deduce types except when:
    • The type is not obvious from context (prefer explicit for clarity)
    • Specific type requirements matter (numeric conversions, template parameters)
    • Interface contracts need explicit types (public APIs, function signatures)
  • Prefer uninitialized memory to default initialization when using before initializing would be an error
    • Valgrind will catch uninitialized memory usage bugs
    • Avoid hiding logic errors with unnecessary zero-initialization
    • Default initialization can mask bugs and hurt performance
  • Floating point is for metrics only - avoid float/double in core data structures and algorithms
    • Use for performance measurements, statistics, and monitoring data
    • Never use for counts, sizes, or business logic

Performance Focus

  • Performance-first design - optimize for the hot path
  • Simple is fast - find exactly what's necessary, strip away everything else
  • Complexity must be justified with benchmarks - measure performance impact before adding complexity
  • Strive for 0% CPU usage when idle - avoid polling, busy waiting, or unnecessary background activity
  • Use inline functions for performance-critical code (e.g., allocate_raw)
  • Zero-copy operations with std::string_view over string copying
  • Arena allocation for efficient memory management

Complexity Control

  • Encapsulation is the main tool for controlling complexity
  • Header files define the interface - they are the contract with users of your code
  • Headers should be complete - include everything needed to use the interface effectively:
    • Usage examples in comments
    • Preconditions and postconditions
    • Thread safety guarantees
    • Performance characteristics
    • Ownership and lifetime semantics
  • Do not rely on undocumented interface properties - if it's not in the header, don't depend on it

Naming Conventions

Variables and Functions

  • snake_case for all variables, functions, and member functions
size_t used_bytes() const;
void add_block(size_t size);
uint32_t initial_block_size_;

Classes and Structs

  • PascalCase for class and struct names
  • Prefer struct over class - public members at the top, avoid class { public: pattern
struct ArenaAllocator {};
struct CommitRequest {};
struct JsonCommitRequestParser {};

Enums

  • PascalCase for enum class names
  • PascalCase for enum values (not SCREAMING_SNAKE_CASE)
enum class Type {
  PointRead,
  RangeRead
};

enum class ParseState {
  Root,
  PreconditionsArray,
  OperationObject
};

Constants and Macros

  • snake_case for constants
  • Avoid macros when possible; prefer constexpr variables
static const WeaselJsonCallbacks json_callbacks;

Member Variables

  • Trailing underscore for private member variables
private:
  uint32_t initial_block_size_;
  Block *current_block_;

Template Parameters

  • PascalCase for template type parameters
template <typename T, typename... Args>
template <typename U> struct rebind {};

File Organization

Header Files

  • Use #pragma once instead of include guards
  • Never using namespace std - always use fully qualified names for clarity and safety
  • Include order:
    1. Corresponding header file (for .cpp files)
    2. Standard library headers (alphabetical)
    3. Third-party library headers
    4. Project headers
#pragma once

#include <algorithm>
#include <cstddef>
#include <cstdint>
#include <memory>
#include <string_view>

#include <simdutf.h>
#include <weaseljson/weaseljson.h>

#include "arena_allocator.hpp"
#include "commit_request.hpp"

// Never this:
// using namespace std;

// Always this:
std::vector<int> data;
std::unique_ptr<Parser> parser;

Source Files

  • Include corresponding header first
  • Follow same include order as headers
#include "json_commit_request_parser.hpp"

#include <cassert>
#include <charconv>
#include <cstring>

#include <simdutf.h>

#include "json_token_enum.hpp"

Code Structure

Class Design

  • Move-only semantics for resource-owning classes
  • Explicit constructors to prevent implicit conversions
  • Delete copy operations when inappropriate
struct ArenaAllocator {
  explicit ArenaAllocator(size_t initial_size = 1024);

  // Copy construction is not allowed
  ArenaAllocator(const ArenaAllocator &) = delete;
  ArenaAllocator &operator=(const ArenaAllocator &) = delete;

  // Move semantics
  ArenaAllocator(ArenaAllocator &&other) noexcept;
  ArenaAllocator &operator=(ArenaAllocator &&other) noexcept;

private:
  uint32_t initial_block_size_;
  Block *current_block_;
};

Function Design

  • Const correctness - mark methods const when appropriate
  • Parameter passing:
    • Pass by value for types ≤ 16 bytes (int, pointers, string_view, small structs)
    • Pass by const reference for types > 16 bytes (containers, large objects)
  • Return by value for small types (≤ 16 bytes), string_view for zero-copy over strings
  • noexcept specification for move operations and non-throwing functions
std::span<const Operation> operations() const { return operations_; }
void process_data(std::string_view data);  // ≤ 16 bytes, pass by value
void process_request(const CommitRequest& req);  // > 16 bytes, pass by reference
ArenaAllocator(ArenaAllocator &&other) noexcept;

Template Usage

  • Template constraints using static_assert for better error messages
  • SFINAE or concepts for template specialization
template <typename T> T *construct(Args &&...args) {
  static_assert(
      std::is_trivially_destructible_v<T>,
      "ArenaAllocator::construct requires trivially destructible types.");
  // ...
}

Control Flow

  • Early returns to reduce nesting
  • Range-based for loops when possible
if (size == 0) {
  return nullptr;
}

for (auto &precondition : preconditions_) {
  // ...
}

Memory Management

Ownership & Allocation

  • Arena allocators for request-scoped memory with STL allocator adapters
  • String views pointing to arena-allocated memory for zero-copy
  • Prefer unique_ptr for exclusive ownership
  • shared_ptr only if shared ownership is necessary - most objects have single owners
  • Factory patterns for complex construction and ownership control
// Static factory methods for complex objects requiring specific initialization
auto server = Server::create(config, handler);  // Ensures shared_ptr semantics
Block *block = Block::create(size, prev);       // Custom allocation + setup

// Friend-based factories for access control
struct Connection {
  // Public interface first
  void appendMessage(std::string_view data);
  bool writeBytes();

private:
  Connection(/* args */);  // Private constructor
  friend struct Server;   // Only Server can construct Connections
};

// Usage in Server
auto conn = std::unique_ptr<Connection>(new Connection(args));

Resource Management

  • RAII everywhere - constructors acquire, destructors release
  • Move semantics for efficient resource transfer
  • Explicit cleanup methods where appropriate
~ArenaAllocator() {
  while (current_block_) {
    Block *prev = current_block_->prev;
    std::free(current_block_);
    current_block_ = prev;
  }
}

Error Handling

Error Reporting

  • Return codes for expected errors
  • Exceptions only for exceptional circumstances
  • fprintf + abort() for unrecoverable errors
enum class ParseResult { Success, InvalidJson, MissingField };

if (!memory) {
  std::fprintf(stderr, "ArenaAllocator: Failed to allocate memory\n");
  std::abort();
}

Assertions

  • Use assert() for debug-time checks
  • Static assertions for compile-time validation
assert(current_block_ && "realloc called with non-null ptr but no current block");
static_assert(std::is_trivially_destructible_v<T>, "Arena requires trivially destructible types");

Documentation

Doxygen Style

  • / for class and public method documentation
  • @brief for short descriptions
  • @param and @return for function parameters
  • @note for important implementation notes
  • @warning for critical usage warnings
/**
 * @brief Type-safe version of realloc_raw for arrays of type T.
 * @param ptr Pointer to the existing allocation
 * @param old_size Size in number of T objects
 * @param new_size Desired new size in number of T objects
 * @return Pointer to reallocated memory
 * @note Prints error to stderr and calls std::abort() if allocation fails
 */
template <typename T>
T *realloc(T *ptr, uint32_t old_size, uint32_t new_size);

Code Comments

  • Explain why, not what - code should be self-documenting
  • Performance notes for optimization decisions
  • Thread safety and ownership semantics
// Uses O(1) accumulated counters for fast retrieval
size_t total_allocated() const;

// Only Server can create connections - no public constructor
Connection(struct sockaddr_storage addr, int fd, int64_t id,
           ConnectionHandler *handler, std::weak_ptr<Server> server);

Testing

Test Framework

  • doctest for unit testing
  • TEST_CASE and SUBCASE for test organization
  • CHECK for assertions (non-terminating)
  • REQUIRE for critical assertions (terminating)

Test Structure

  • Descriptive test names explaining the scenario
  • SUBCASE for related test variations
  • Fresh instances for each test to avoid state contamination
TEST_CASE("ArenaAllocator basic allocation") {
  ArenaAllocator arena;

  SUBCASE("allocate zero bytes returns nullptr") {
    void *ptr = arena.allocate_raw(0);
    CHECK(ptr == nullptr);
  }

  SUBCASE("allocate single byte") {
    void *ptr = arena.allocate_raw(1);
    CHECK(ptr != nullptr);
    CHECK(arena.used_bytes() >= 1);
  }
}

Test Synchronization

  • NEVER use timeouts or sleep-based synchronization
  • Deterministic synchronization only:
    • Blocking I/O operations
    • condition_variable.wait() (no timeout variant)
    • std::latch, std::barrier, futures/promises
    • RAII guards and resource management

Build Integration

CMake Integration

  • Generated code (gperf hash tables) in build directory
  • Ninja as the preferred generator
  • Export compile commands for tooling support
cmake .. -G Ninja -DCMAKE_EXPORT_COMPILE_COMMANDS=ON

Code Generation

  • gperf for perfect hash table generation
  • Build-time generation of token lookup tables
  • Include generated headers from build directory

Style Enforcement

Consistency

  • Follow existing patterns in the codebase
  • Use the same style for similar constructs
  • Maintain consistency within each translation unit

Tools

  • clang-format configuration (when available)
  • Static analysis tools for code quality
  • Address sanitizer for memory safety testing

This style guide reflects the existing codebase patterns and should be followed for all new code contributions to maintain consistency and readability.