25 KiB
WeaselDB C++ Style Guide
This document describes the C++ coding style used in the WeaselDB project. These conventions ensure consistency, readability, and maintainability across the codebase.
Table of Contents
- General Principles
- Naming Conventions
- File Organization
- Code Structure
- Memory Management
- Error Handling
- Documentation
- Testing
General Principles
Language Standard
- C++20 is the target standard
- Use modern C++ features: RAII, move semantics, constexpr, concepts where appropriate
- Prefer standard library containers and algorithms over custom implementations
C Library Functions and Headers
- Always use std:: prefixed versions of C library functions for consistency and clarity
- Use C++ style headers (
<cstring>,<cstdlib>, etc.) instead of C style headers (<string.h>,<stdlib.h>, etc.) - This applies to all standard libc functions:
std::abort(),std::fprintf(),std::free(),std::memcpy(),std::strlen(),std::strncpy(),std::memset(),std::signal(), etc. - Exception: Functions with no std:: equivalent (e.g.,
perror(),gai_strerror()) and system-specific headers (e.g.,<unistd.h>,<fcntl.h>)
// Preferred - C++ style
#include <cstring>
#include <cstdlib>
#include <csignal>
std::abort();
std::fprintf(stderr, "Error message\n");
std::free(ptr);
std::memcpy(dest, src, size);
std::strlen(str);
std::strncpy(dest, src, n);
std::memset(ptr, value, size);
std::signal(SIGTERM, handler);
// Avoid - C style
#include <string.h>
#include <stdlib.h>
#include <signal.h>
abort();
fprintf(stderr, "Error message\n");
free(ptr);
memcpy(dest, src, size);
strlen(str);
strncpy(dest, src, n);
memset(ptr, value, size);
signal(SIGTERM, handler);
Data Types
- Almost always signed - prefer
int,int64_t,ssize_tover unsigned types except for:- Bit manipulation operations
- Interfacing with APIs that require unsigned types
- Where defined unsigned overflow behavior (wraparound) is intentional and desired
- Almost always auto - let the compiler deduce types except when:
- The type is not obvious from context (prefer explicit for clarity)
- Specific type requirements matter (numeric conversions, template parameters)
- Interface contracts need explicit types (public APIs, function signatures)
- Prefer uninitialized memory to default initialization when using before initializing would be an error
- Valgrind will catch uninitialized memory usage bugs
- Avoid hiding logic errors with unnecessary zero-initialization
- Default initialization can mask bugs and hurt performance
- Floating point is for metrics only - avoid
float/doublein core data structures and algorithms- Use for performance measurements, statistics, and monitoring data
- Never use for counts, sizes, or business logic
Type Casting
- Never use C-style casts - they're unsafe and can hide bugs by performing dangerous conversions
- Use C++ cast operators for explicit type conversions with clear intent and safety checks
- Avoid
reinterpret_cast- almost always indicates poor design; redesign APIs instead - Prefer no casts - design APIs and use types that avoid casting entirely when possible
// Dangerous - C-style casts (NEVER DO THIS)
// int* ptr = (int*)malloc(sizeof(int)); // Unsafe
// int64_t id = (int64_t)some_pointer; // Dangerous pointer conversion
// BaseClass* base = (BaseClass*)derived; // Loses type safety
// Acceptable C++ cast operators (use sparingly)
auto ptr = static_cast<int*>(malloc(sizeof(int))); // Explicit conversion
auto base = static_cast<BaseClass*>(derived_ptr); // Safe upcast
auto derived = dynamic_cast<DerivedClass*>(base_ptr); // Runtime type checking
auto mutable_ptr = const_cast<int*>(const_ptr); // Remove const (rare)
// reinterpret_cast can be appropriate for low-level operations (very rare)
auto addr = reinterpret_cast<uintptr_t>(ptr); // Pointer to integer conversion
Performance Focus
- Performance-first design - optimize for the hot path
- Simple is fast - find exactly what's necessary, strip away everything else
- Complexity must be justified with benchmarks - measure performance impact before adding complexity
- Strive for 0% CPU usage when idle - avoid polling, busy waiting, or unnecessary background activity
- Use inline functions for performance-critical code (e.g.,
allocate_raw) - String views with
std::string_viewto minimize unnecessary copying - Arena allocation for efficient memory management (~1ns vs ~20-270ns for malloc)
String Formatting
- Always use
format.hppfunctions - formats directly into arena-allocated memory - Use
static_format()for performance-sensitive code - faster but less flexible thanformat() - Use
format()function with arena allocator for printf-style formatting
// Most performance-sensitive - compile-time optimized concatenation
std::string_view response = static_format(arena,
"HTTP/1.1 ", status_code, " OK\r\n",
"Content-Length: ", body.size(), "\r\n",
"\r\n", body);
// Printf-style formatting - runtime flexible
Arena& arena = conn.get_arena();
std::string_view response = format(arena,
"HTTP/1.1 %d OK\r\n"
"Content-Length: %zu\r\n"
"\r\n%.*s",
status_code, body.size(),
static_cast<int>(body.size()), body.data());
Complexity Control
- Encapsulation is the main tool for controlling complexity
- Header files define the interface - they are the contract with users of your code
- Headers should be complete - include everything needed to use the interface effectively:
- Usage examples in comments
- Preconditions and postconditions
- Thread safety guarantees
- Performance characteristics
- Ownership and lifetime semantics
- Do not rely on undocumented interface properties - if it's not in the header, don't depend on it
Naming Conventions
Variables and Functions
- snake_case for all variables, functions, and member functions
- Legacy camelCase exists - the codebase currently contains mixed naming due to historical development. New code should use snake_case. Existing camelCase should be converted to snake_case during natural refactoring (not mass renaming).
int64_t used_bytes() const;
void add_block(int64_t size);
int32_t initial_block_size_;
Classes and Structs
- PascalCase for class/struct names
- Always use struct keyword - eliminates debates about complexity and maintains consistency
- Public members first, private after - puts the interface users care about at the top, implementation details below
- Full encapsulation still applies - use
private:sections to hide implementation details and maintain deep, capable structs - The struct keyword doesn't mean shallow design - it means interface-first organization for human readers
struct Arena {
// Public interface first
explicit Arena(int64_t initial_size = 1024);
void* allocate_raw(int64_t size);
private:
// Private members after
int32_t initial_block_size_;
Block* current_block_;
};
Enums
- PascalCase for enum class names
- PascalCase for enum values (not SCREAMING_SNAKE_CASE)
enum class Type {
PointRead,
RangeRead
};
enum class ParseState {
Root,
PreconditionsArray,
OperationObject
};
Constants and Macros
- snake_case for constants
- Avoid macros when possible; prefer
constexprvariables
static const WeaselJsonCallbacks json_callbacks;
Member Variables
- Trailing underscore for private member variables
private:
int32_t initial_block_size_;
Block *current_block_;
Template Parameters
- PascalCase for template type parameters
template <typename T, typename... Args>
template <typename T> struct rebind { using type = T*; };
File Organization
Include Organization
- Use
#pragma onceinstead of include guards - Never
using namespace std- always use fully qualified names for clarity and safety - Include order (applies to both headers and source files):
- Corresponding header file (for .cpp files only)
- Standard library headers (alphabetical)
- Third-party library headers
- Project headers
#pragma once
#include <algorithm>
#include <cstddef>
#include <cstdint>
#include <memory>
#include <string_view>
#include <simdutf.h>
#include <weaseljson/weaseljson.h>
#include "arena.hpp"
#include "commit_request.hpp"
// Never this:
// using namespace std;
// Always this:
std::vector<int> data;
std::unique_ptr<Parser> parser;
Code Structure
Class Design
- Move-only semantics for resource-owning types
- Explicit constructors to prevent implicit conversions
- Delete copy operations when inappropriate
struct Arena {
explicit Arena(int64_t initial_size = 1024);
// Copy construction is not allowed
Arena(const Arena &source) = delete;
Arena &operator=(const Arena &source) = delete;
// Move semantics
Arena(Arena &&source) noexcept;
Arena &operator=(Arena &&source) noexcept;
private:
int32_t initial_block_size_;
Block *current_block_;
};
Function Design
- Const correctness - mark methods const when appropriate
- Parameter passing:
- Pass by value for types ≤ 16 bytes (int, pointers, string_view, small structs)
- Pass by const reference for types > 16 bytes (containers, large objects)
- Return by value for small types (≤ 16 bytes), string_view to avoid copying strings
- noexcept specification for move operations and non-throwing functions
std::span<const Operation> operations() const { return operations_; }
void process_data(std::string_view request_data); // ≤ 16 bytes, pass by value
void process_request(const CommitRequest& commit_request); // > 16 bytes, pass by reference
Arena(Arena &&source) noexcept;
Template Usage
- Template constraints using static_assert for better error messages
- SFINAE or concepts for template specialization
Factory Patterns & Ownership
- Static factory methods for complex construction requiring shared ownership
- Friend-based factories for access control when constructor should be private
- Ownership guidelines:
- unique_ptr for exclusive ownership (most common case)
- shared_ptr only when multiple owners need concurrent access to same object
- Factory methods return appropriate smart pointer type based on ownership needs
// Shared ownership - multiple components need concurrent access
auto server = Server::create(config, handler); // Returns shared_ptr
// Exclusive ownership - single owner, transfer via move
auto connection = Connection::createForServer(addr, fd, connection_id, handler, server_ref);
// Friend-based factory for access control
struct Connection {
void append_message(std::string_view message_data);
private:
Connection(struct sockaddr_storage client_addr, int file_descriptor,
int64_t connection_id, ConnectionHandler* request_handler,
std::weak_ptr<Server> server_ref);
friend struct Server; // Only Server can construct
};
Control Flow
- Early returns to reduce nesting
- Range-based for loops when possible
if (size == 0) {
return nullptr;
}
for (auto &precondition : preconditions_) {
// ...
}
Atomic Operations
- Never use assignment operators with
std::atomic- always use explicitstore()andload() - Always specify memory ordering explicitly for atomic operations
- Use the least restrictive correct memory ordering - choose the weakest ordering that maintains correctness
// Preferred - explicit store/load with precise memory ordering
std::atomic<uint64_t> counter;
counter.store(42, std::memory_order_relaxed); // Single-writer metric updates
auto value = counter.load(std::memory_order_relaxed); // Reading metrics for display
counter.store(1, std::memory_order_release); // Publishing initialization
auto ready = counter.load(std::memory_order_acquire); // Synchronizing with publisher
counter.store(42, std::memory_order_seq_cst); // When sequential consistency needed
// Avoid - assignment operators (implicit memory ordering)
std::atomic<uint64_t> counter;
counter = 42; // Implicit - memory ordering not explicit
auto value = counter; // Implicit - memory ordering not explicit
Memory Management
Ownership & Allocation
- Arena allocators for request-scoped memory with STL allocator adapters (see Performance Focus section for characteristics)
- String views pointing to arena-allocated memory to avoid unnecessary copying
- STL containers with arena allocators require default construction after arena reset -
clear()is not sufficient
// STL containers with arena allocators - correct reset pattern
std::vector<Operation, ArenaStlAllocator<Operation>> operations(arena);
// ... use container ...
operations = {}; // Default construct - clear() won't work correctly
arena.reset(); // Reset arena memory
Resource Management
- RAII everywhere - constructors acquire, destructors release
- Move semantics for efficient resource transfer
- Explicit cleanup methods where appropriate
~Arena() {
while (current_block_) {
Block *prev = current_block_->prev;
std::free(current_block_);
current_block_ = prev;
}
}
Error Handling
Error Classification & Response
- Expected errors (invalid input, timeouts): Return error codes for programmatic handling
- System failures (malloc fail, socket fail): Abort immediately with error message
- Programming errors (precondition violations, assertions): Abort immediately
Error Contract Design
- Error codes are the API contract - use enums for programmatic decisions
- Error messages are human-readable only - never parse message strings
- Consistent error boundaries - each component defines what it can/cannot recover from
- Interface precondition violations are undefined behavior - acceptable to skip checks for performance in hot paths
- Error code types must be nodiscard - mark error code enums with
[[nodiscard]]to prevent silent failures
enum class [[nodiscard]] ParseResult { Success, InvalidJson, MissingField };
// System failure - abort immediately
void* memory = std::malloc(size);
if (!memory) {
std::fprintf(stderr, "Arena: Memory allocation failed\n");
std::abort();
}
// ... use memory, eventually std::free(memory)
// Programming error - precondition violation (may be omitted for performance)
assert(ptr != nullptr && "Precondition violated: pointer must be non-null");
Assertions
- Programming error detection using standard
assert()macro - Assertion behavior follows C++ standards:
- Debug builds: Assertions active (undefined
NDEBUG) - Release builds: Assertions removed (defined
NDEBUG)
- Debug builds: Assertions active (undefined
- Test targets override: Use
-UNDEBUGto force assertions active in all builds - Static assertions for compile-time validation (always active)
Usage guidelines:
- Use for programming errors: null checks, precondition validation, invariants
- Don't use for expected runtime errors: use return codes instead
// Good: Programming error checks
assert(current_block_ && "realloc called with non-null ptr but no current block");
assert(size > 0 && "Cannot allocate zero bytes");
// Good: Compile-time validation (always enabled)
static_assert(std::is_trivially_destructible_v<T>, "Arena requires trivially destructible types");
// Bad: Don't use assert for expected runtime errors
// assert(file_exists(path)); // File might legitimately not exist - use return code instead
System Call Error Handling
When a system call is interrupted by a signal (EINTR), it is usually necessary to retry the call. This is especially true for "slow" system calls that can block for a long time, such as read, write, accept, connect, sem_wait, and epoll_wait.
Rule: Always wrap potentially interruptible system calls in a do-while loop that checks for EINTR.
Example:
int fd;
do {
fd = accept(listen_fd, nullptr, nullptr);
} while (fd == -1 && errno == EINTR);
if (fd == -1) {
// Handle other errors (perror has no std:: equivalent)
perror("accept");
std::abort();
}
Special case - close():
The close() system call is a special case on Linux. According to man 2 close, when close() returns EINTR on Linux, the file descriptor is still guaranteed to be closed. Therefore, close() should never be retried.
// Correct: Do not retry close() on EINTR
int result = close(fd);
if (result == -1 && errno != EINTR) {
// Handle non-EINTR errors only (perror has no std:: equivalent)
perror("close");
std::abort();
}
// Note: fd is guaranteed closed even on EINTR
Non-interruptible calls:
Most system calls are not interruptible in practice. For these, it is not necessary to add a retry loop. This includes:
fcntl(withF_GETFL,F_SETFL,F_GETFD,F_SETFD- note:F_SETLKWandF_OFD_SETLKWCAN return EINTR)epoll_ctlsocketpairpipesetsockoptepoll_create1close(special case: guaranteed closed even on EINTR on Linux)
When in doubt, consult the man page for the specific system call to see if it can return EINTR.
Documentation
Doxygen Style
- / for struct and public method documentation
- @brief for short descriptions
- @param and @return for function parameters
- @note for important implementation notes
- @warning for critical usage warnings
/**
* @brief Type-safe version of realloc_raw for arrays of type T.
* @param existing_ptr Pointer to the existing allocation
* @param current_size Size in number of T objects
* @param requested_size Desired new size in number of T objects
* @return Pointer to reallocated memory
* @note Prints error to stderr and calls std::abort() if allocation fails
*/
template <typename T>
T *realloc(T *existing_ptr, int32_t current_size, int32_t requested_size);
Code Comments
- Explain why, not what - code should be self-documenting
- Performance notes for optimization decisions
- Thread safety and ownership semantics
// Uses O(1) accumulated counters for fast retrieval
int64_t total_allocated() const;
// Only Server can create connections - no public constructor
Connection(struct sockaddr_storage addr, int fd, int64_t id,
ConnectionHandler *handler, std::weak_ptr<Server> server);
Testing
Test Framework
- doctest for unit testing
- TEST_CASE and SUBCASE for test organization
- CHECK for assertions (non-terminating)
- REQUIRE for critical assertions (terminating)
Test Structure
- Descriptive test names explaining the scenario
- SUBCASE for related test variations
- Fresh instances for each test to avoid state contamination
TEST_CASE("Arena basic allocation") {
Arena arena;
SUBCASE("allocate zero bytes returns nullptr") {
void *ptr = arena.allocate_raw(0);
CHECK(ptr == nullptr);
}
SUBCASE("allocate single byte") {
void *ptr = arena.allocate_raw(1);
CHECK(ptr != nullptr);
CHECK(arena.used_bytes() >= 1);
}
}
Test Design Principles
- Test the contract, not the implementation - validate what the API promises to deliver, not implementation details
- Both integration and unit tests - test components in isolation and working together
- Prefer fakes to mocks - use real implementations for internal components, fake external dependencies
- Always enable assertions in tests - use
-UNDEBUGpattern to ensure assertions are checked (see Build Integration section)
// Good: Testing through public API
TEST_CASE("Server accepts connections") {
auto config = Config::defaultConfig();
auto handler = std::make_unique<TestHandler>();
auto server = Server::create(config, std::move(handler));
// Test observable behavior - server can accept connections
auto result = connectToServer(server->getPort());
CHECK(result.connected);
}
// Avoid: Testing internal implementation details
// TEST_CASE("Server creates epoll instance") { /* implementation detail */ }
What NOT to Test
Avoid testing language features and plumbing:
- Don't test that virtual functions dispatch correctly
- Don't test that standard library types work (unique_ptr, containers, etc.)
- Don't test basic constructor/destructor calls
Test business logic instead:
- When does your code call hooks/callbacks and why?
- What state transitions trigger behavior changes?
- How does your code handle error conditions?
- What promises does your API make to users?
Ask: "Am I testing the C++ compiler or my application logic?"
Test Synchronization (Authoritative Rules)
- ABSOLUTELY NEVER use timeouts (
sleep_for,wait_for, etc.) - Deterministic synchronization only:
- Blocking I/O (naturally waits for completion)
condition_variable.wait()without timeoutstd::latch,std::barrier, futures/promises
- Force concurrent execution using
std::latchto synchronize thread startup
Threading Checklist for Tests/Benchmarks
Common threading principles (all concurrent code):
- Count total threads - Include main/benchmark thread in count
- Always assume concurrent execution needed - Tests/benchmarks require real concurrency
- Add synchronization primitive -
std::latch start_latch{N}(most common),std::barrier, or similar where N = total concurrent threads - Each thread synchronizes before doing work - e.g.,
start_latch.arrive_and_wait()orbarrier.arrive_and_wait() - Main thread synchronizes before measurement/execution - ensures all threads start simultaneously
Test-specific:
- Perform many operations per thread creation - amortize thread creation cost and increase chances of hitting race conditions
- Pattern: Create test that spawns threads and runs many operations, then run that test many times - amortizes thread creation cost while providing fresh test instances
- Run 100-10000 operations per test, and 100-10000 test iterations - maximizes chances of hitting race conditions
- Always run with ThreadSanitizer - compile with
-fsanitize=thread
Benchmark-specific:
- NEVER create threads inside the benchmark measurement - creates thread creation/destruction overhead, not contention
- Create background threads OUTSIDE the benchmark that run continuously during measurement
- Use
std::atomic<bool> keep_runningto cleanly shut down background threads after benchmark - Measure only the foreground operation under real contention from background threads
Red flags to catch immediately:
- ❌ Creating threads in a loop without
std::latch - ❌ Background threads starting work immediately
- ❌ Benchmark measuring before all threads synchronized
- ❌ Any use of
sleep_for,wait_for, or timeouts
Simple rule: Multiple threads = std::latch synchronization. No exceptions, even for "simple" background threads.
// BAD: Race likely over before threads start
int counter = 0;
for (int i = 0; i < 4; ++i) {
threads.emplace_back([&]() { counter++; }); // Probably sequential
}
// GOOD: Force threads to race simultaneously
int counter = 0;
std::latch start_latch{4};
for (int i = 0; i < 4; ++i) {
threads.emplace_back([&]() {
start_latch.count_down_and_wait(); // All threads start together
counter++; // Now they actually race (data race on non-atomic)
});
}
Build Integration
Build Configuration
# Debug: assertions on, optimizations off
cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Debug -DCMAKE_EXPORT_COMPILE_COMMANDS=ON
# Release: assertions off, optimizations on
cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Release -DCMAKE_EXPORT_COMPILE_COMMANDS=ON
Test Target Pattern:
- Production targets follow build type (assertions off in Release)
- Test targets use
-UNDEBUGto force assertions on in all builds - Ensures consistent test validation regardless of build type
# Test target with assertions always enabled
add_executable(test_example tests/test_example.cpp src/example.cpp)
target_link_libraries(test_example doctest::doctest)
target_compile_options(test_example PRIVATE -UNDEBUG) # Always enable assertions
# Production target follows build type
add_executable(example src/example.cpp src/main.cpp)
# No -UNDEBUG → assertions disabled in Release, enabled in Debug
Code Generation
- Generated files go in build directory, not source