Files
weaseldb/style.md

535 lines
19 KiB
Markdown

# WeaselDB C++ Style Guide
This document describes the C++ coding style used in the WeaselDB project. These conventions ensure consistency, readability, and maintainability across the codebase.
## Table of Contents
1. [General Principles](#general-principles)
2. [Naming Conventions](#naming-conventions)
3. [File Organization](#file-organization)
4. [Code Structure](#code-structure)
5. [Memory Management](#memory-management)
6. [Error Handling](#error-handling)
7. [Documentation](#documentation)
8. [Testing](#testing)
---
## General Principles
### Language Standard
- **C++20** is the target standard
- Use modern C++ features: RAII, move semantics, constexpr, concepts where appropriate
- Prefer standard library containers and algorithms over custom implementations
### Data Types
- **Almost always signed** - prefer `int`, `int64_t`, `size_t` over unsigned types except for:
- Bit manipulation operations
- Interfacing with APIs that require unsigned types
- Memory sizes where overflow is impossible (`size_t`, `uint32_t` for arena block sizes)
- Where defined unsigned overflow behavior (wraparound) is intentional and desired
- **Almost always auto** - let the compiler deduce types except when:
- The type is not obvious from context (prefer explicit for clarity)
- Specific type requirements matter (numeric conversions, template parameters)
- Interface contracts need explicit types (public APIs, function signatures)
- **Prefer uninitialized memory to default initialization** when using before initializing would be an error
- Valgrind will catch uninitialized memory usage bugs
- Avoid hiding logic errors with unnecessary zero-initialization
- Default initialization can mask bugs and hurt performance
- **Floating point is for metrics only** - avoid `float`/`double` in core data structures and algorithms
- Use for performance measurements, statistics, and monitoring data
- Never use for counts, sizes, or business logic
### Performance Focus
- **Performance-first design** - optimize for the hot path
- **Simple is fast** - find exactly what's necessary, strip away everything else
- **Complexity must be justified with benchmarks** - measure performance impact before adding complexity
- **Strive for 0% CPU usage when idle** - avoid polling, busy waiting, or unnecessary background activity
- Use **inline functions** for performance-critical code (e.g., `allocate_raw`)
- **Zero-copy operations** with `std::string_view` over string copying
- **Arena allocation** for efficient memory management (see Memory Management section for details)
### Complexity Control
- **Encapsulation is the main tool for controlling complexity**
- **Header files define the interface** - they are the contract with users of your code
- **Headers should be complete** - include everything needed to use the interface effectively:
- Usage examples in comments
- Preconditions and postconditions
- Thread safety guarantees
- Performance characteristics
- Ownership and lifetime semantics
- **Do not rely on undocumented interface properties** - if it's not in the header, don't depend on it
---
## Naming Conventions
### Variables and Functions
- **snake_case** for all variables, functions, and member functions
```cpp
size_t used_bytes() const;
void add_block(size_t size);
uint32_t initial_block_size_;
```
### Structs
- **PascalCase** for struct names
- **Always use struct** - eliminates debates about complexity and maintains consistency
- **Public members first, private after** - puts the interface users care about at the top, implementation details below
- **Full encapsulation still applies** - use `private:` sections to hide implementation details and maintain deep, capable classes
- The struct keyword doesn't mean shallow design - it means interface-first organization for human readers
```cpp
struct ArenaAllocator {
// Public interface first
explicit ArenaAllocator(size_t initial_size = 1024);
void* allocate_raw(size_t size);
private:
// Private members after
uint32_t initial_block_size_;
Block* current_block_;
};
```
### Enums
- **PascalCase** for enum class names
- **PascalCase** for enum values (not SCREAMING_SNAKE_CASE)
```cpp
enum class Type {
PointRead,
RangeRead
};
enum class ParseState {
Root,
PreconditionsArray,
OperationObject
};
```
### Constants and Macros
- **snake_case** for constants
- Avoid macros when possible; prefer `constexpr` variables
```cpp
static const WeaselJsonCallbacks json_callbacks;
```
### Member Variables
- **Trailing underscore** for private member variables
```cpp
private:
uint32_t initial_block_size_;
Block *current_block_;
```
### Template Parameters
- **PascalCase** for template type parameters
```cpp
template <typename T, typename... Args>
template <typename T> struct rebind { using type = T*; };
```
---
## File Organization
### Header Files
- Use **`#pragma once`** instead of include guards
- **Never `using namespace std`** - always use fully qualified names for clarity and safety
- **Include order:**
1. Corresponding header file (for .cpp files)
2. Standard library headers (alphabetical)
3. Third-party library headers
4. Project headers
```cpp
#pragma once
#include <algorithm>
#include <cstddef>
#include <cstdint>
#include <memory>
#include <string_view>
#include <simdutf.h>
#include <weaseljson/weaseljson.h>
#include "arena_allocator.hpp"
#include "commit_request.hpp"
// Never this:
// using namespace std;
// Always this:
std::vector<int> data;
std::unique_ptr<Parser> parser;
```
### Source Files
- Include corresponding header first
- Follow same include order as headers (see Header Files section above)
---
## Code Structure
### Struct Design
- **Move-only semantics** for resource-owning structs
- **Explicit constructors** to prevent implicit conversions
- **Delete copy operations** when inappropriate
```cpp
struct ArenaAllocator {
explicit ArenaAllocator(size_t initial_size = 1024);
// Copy construction is not allowed
ArenaAllocator(const ArenaAllocator &) = delete;
ArenaAllocator &operator=(const ArenaAllocator &) = delete;
// Move semantics
ArenaAllocator(ArenaAllocator &&other) noexcept;
ArenaAllocator &operator=(ArenaAllocator &&other) noexcept;
private:
uint32_t initial_block_size_;
Block *current_block_;
};
```
### Function Design
- **Const correctness** - mark methods const when appropriate
- **Parameter passing:**
- Pass by value for types ≤ 16 bytes (int, pointers, string_view, small structs)
- Pass by const reference for types > 16 bytes (containers, large objects)
- **Return by value** for small types (≤ 16 bytes), **string_view** for zero-copy over strings
- **noexcept specification** for move operations and non-throwing functions
```cpp
std::span<const Operation> operations() const { return operations_; }
void process_data(std::string_view data); // ≤ 16 bytes, pass by value
void process_request(const CommitRequest& req); // > 16 bytes, pass by reference
ArenaAllocator(ArenaAllocator &&other) noexcept;
```
### Template Usage
- **Template constraints** using static_assert for better error messages
- **SFINAE** or concepts for template specialization
```cpp
template <typename T> T *construct(Args &&...args) {
static_assert(
std::is_trivially_destructible_v<T>,
"ArenaAllocator::construct requires trivially destructible types.");
// ...
}
```
### Factory Patterns
- **Static factory methods** for complex construction requiring specific initialization
- **Friend-based factories** for access control when constructor should be private
- **Factory patterns ensure proper ownership semantics** (shared_ptr vs unique_ptr)
```cpp
// Static factory method
auto server = Server::create(config, handler); // Returns shared_ptr
// Friend-based factory for access control
struct Connection {
void appendMessage(std::string_view data);
private:
Connection(/* args */); // Private constructor
friend struct Server; // Only Server can construct
};
```
### Control Flow
- **Early returns** to reduce nesting
- **Range-based for loops** when possible
```cpp
if (size == 0) {
return nullptr;
}
for (auto &precondition : preconditions_) {
// ...
}
```
---
## Memory Management
### Ownership & Allocation
- **Arena allocators** for request-scoped memory with **STL allocator adapters** (provides ~1ns allocation vs ~20-270ns for malloc)
- **String views** pointing to arena-allocated memory for zero-copy operations
- **Prefer unique_ptr** for exclusive ownership
- **shared_ptr only if shared ownership is necessary** - most objects have single owners
- **Factory patterns** for complex construction and ownership control (see Code Structure section for factory patterns)
- **STL containers with arena allocators require default construction after arena reset** - `clear()` is not sufficient
```cpp
// STL containers with arena allocators - correct reset pattern
std::vector<Operation, ArenaStlAllocator<Operation>> operations(arena_alloc);
// ... use container ...
operations = {}; // Default construct - clear() won't work correctly
arena.reset(); // Reset arena memory
```
### Resource Management
- **RAII** everywhere - constructors acquire, destructors release
- **Move semantics** for efficient resource transfer
- **Explicit cleanup** methods where appropriate
```cpp
~ArenaAllocator() {
while (current_block_) {
Block *prev = current_block_->prev;
std::free(current_block_);
current_block_ = prev;
}
}
```
---
## Error Handling
### Error Philosophy
- **Return codes** for expected errors that can be handled
- **Abort for system failures** - If we can't uphold the component's contract, perror/fprintf then abort. If recovery is possible, change the component's contract to allow returning an error code.
- **Error messages are for humans only** - never parse error message strings programmatically
- **Error codes are the contract** - use enums/codes for programmatic error handling
### Error Boundaries
- **Expected errors**: Invalid user input, network timeouts, file not found - return error codes
- **System failures**: Memory allocation failure, socket creation failure - abort immediately
- **Programming errors**: Assertion failures, null pointer dereference - abort immediately
```cpp
enum class ParseResult { Success, InvalidJson, MissingField };
// Good: Test error codes (part of contract)
auto result = parser.parse(data);
if (result == ParseResult::InvalidJson) {
// Handle programmatically
}
// Bad: Don't test or parse error message strings
// CHECK(parser.get_error() == "Expected '}' at line 5"); // BRITTLE!
// System resource failures: abort immediately
void ArenaAllocator::allocate() {
void* memory = malloc(size);
if (!memory) {
std::fprintf(stderr, "ArenaAllocator: Failed to allocate memory\n");
std::abort(); // Process is likely in bad state
}
}
```
### Assertions
- Use **assert()** for debug-time checks that validate program correctness
- **Static assertions** for compile-time validation
- **Standard assert behavior**: Assertions are **enabled by default** and **disabled when `NDEBUG` is defined**
- **Use assertions for programming errors**: Null pointer checks, precondition validation, invariant checking
- **Don't use assertions for expected runtime errors**: Use return codes for recoverable conditions
```cpp
// Good: Programming error checks (enabled by default, disabled with NDEBUG)
assert(current_block_ && "realloc called with non-null ptr but no current block");
assert(size > 0 && "Cannot allocate zero bytes");
assert(ptr != nullptr && "Invalid pointer passed to realloc");
// Good: Compile-time validation (always enabled)
static_assert(std::is_trivially_destructible_v<T>, "Arena requires trivially destructible types");
// Bad: Don't use assert for expected runtime errors
// assert(file_exists(path)); // File might legitimately not exist - use return code instead
```
**Build Configuration:**
- **Debug builds**: `cmake -DCMAKE_BUILD_TYPE=Debug` → assertions **enabled** (default behavior)
- **Release builds**: `cmake -DCMAKE_BUILD_TYPE=Release` → assertions **disabled** (defines `NDEBUG`)
- **Test targets**: Always have assertions **enabled** using `-UNDEBUG` pattern (see Build Integration section)
- **Testing**: Test in both debug and release builds to catch assertion failures in all configurations
---
## Documentation
### Doxygen Style
- **/** for struct and public method documentation
- **@brief** for short descriptions
- **@param** and **@return** for function parameters
- **@note** for important implementation notes
- **@warning** for critical usage warnings
```cpp
/**
* @brief Type-safe version of realloc_raw for arrays of type T.
* @param ptr Pointer to the existing allocation
* @param old_size Size in number of T objects
* @param new_size Desired new size in number of T objects
* @return Pointer to reallocated memory
* @note Prints error to stderr and calls std::abort() if allocation fails
*/
template <typename T>
T *realloc(T *ptr, uint32_t old_size, uint32_t new_size);
```
### Code Comments
- **Explain why, not what** - code should be self-documenting
- **Performance notes** for optimization decisions
- **Thread safety** and ownership semantics
```cpp
// Uses O(1) accumulated counters for fast retrieval
size_t total_allocated() const;
// Only Server can create connections - no public constructor
Connection(struct sockaddr_storage addr, int fd, int64_t id,
ConnectionHandler *handler, std::weak_ptr<Server> server);
```
---
## Testing
### Test Framework
- **doctest** for unit testing
- **TEST_CASE** and **SUBCASE** for test organization
- **CHECK** for assertions (non-terminating)
- **REQUIRE** for critical assertions (terminating)
### Test Structure
- **Descriptive test names** explaining the scenario
- **SUBCASE** for related test variations
- **Fresh instances** for each test to avoid state contamination
```cpp
TEST_CASE("ArenaAllocator basic allocation") {
ArenaAllocator arena;
SUBCASE("allocate zero bytes returns nullptr") {
void *ptr = arena.allocate_raw(0);
CHECK(ptr == nullptr);
}
SUBCASE("allocate single byte") {
void *ptr = arena.allocate_raw(1);
CHECK(ptr != nullptr);
CHECK(arena.used_bytes() >= 1);
}
}
```
### Test Design Principles
- **Prefer testing through public interfaces** - focus on observable behavior rather than implementation details
- **Test the contract, not the implementation** - validate what the API promises to deliver
- **Avoid testing private methods directly** - if private functionality needs testing, consider if it should be public or extracted
- **Both integration and unit tests** - test components in isolation and working together
- **Prefer fakes to mocks** - use real implementations for internal components, fake external dependencies
- **Always enable assertions in tests** - use `-UNDEBUG` pattern to ensure assertions are checked (see Build Integration section)
```cpp
// Good: Testing through public API
TEST_CASE("Server accepts connections") {
auto config = Config::defaultConfig();
auto handler = std::make_unique<TestHandler>();
auto server = Server::create(config, std::move(handler));
// Test observable behavior - server can accept connections
auto result = connectToServer(server->getPort());
CHECK(result.connected);
}
// Avoid: Testing internal implementation details
// TEST_CASE("Server creates epoll instance") { /* implementation detail */ }
```
### Test Synchronization
- **NEVER use timeouts** or sleep-based synchronization
- **Deterministic synchronization only:**
- Blocking I/O operations
- `condition_variable.wait()` (no timeout variant)
- `std::latch`, `std::barrier`, futures/promises
- RAII guards and resource management
### Multithreading Test Correctness
- **Force concurrent execution** - Thread creation takes time, so work often completes sequentially before threads start
- **Use std::latch to synchronize thread startup** - Ensures all threads begin racing simultaneously
```cpp
// BAD: Race likely over before threads start
std::atomic<int> counter{0};
for (int i = 0; i < 4; ++i) {
threads.emplace_back([&]() { counter++; }); // Probably sequential
}
// GOOD: Force threads to race simultaneously
std::atomic<int> counter{0};
std::latch start_latch{4};
for (int i = 0; i < 4; ++i) {
threads.emplace_back([&]() {
start_latch.count_down_and_wait(); // All threads start together
counter++; // Now they actually race
});
}
```
---
## Build Integration
### CMake Integration
- **Generated code** (gperf hash tables) in build directory
- **Ninja** as the preferred generator
- **Export compile commands** for tooling support
**Build Types:**
```bash
# Debug build (assertions enabled by default, optimizations off)
cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Debug -DCMAKE_EXPORT_COMPILE_COMMANDS=ON
# Release build (assertions disabled, optimizations on)
cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Release -DCMAKE_EXPORT_COMPILE_COMMANDS=ON
```
**Testing and Development:**
- **Test targets always have assertions enabled** - even in release builds, test targets use `-UNDEBUG` to ensure assertions are checked
- **Production builds have assertions disabled** - the main `weaseldb` executable follows standard build type behavior
- **Use Release builds for performance testing** and production deployment
- **This ensures tests catch assertion failures** regardless of build configuration
### Test Assertion Pattern (-UNDEBUG)
**Problem**: Release builds define `NDEBUG` which disables assertions, but tests should always validate assertions to catch programming errors.
**Solution**: Use `-UNDEBUG` compiler flag for test targets to undefine `NDEBUG` and re-enable assertions.
**CMake Implementation:**
```cmake
# Test target with assertions always enabled
add_executable(test_example tests/test_example.cpp src/example.cpp)
target_link_libraries(test_example doctest::doctest)
target_compile_options(test_example PRIVATE -UNDEBUG) # Always enable assertions
# Production target follows build type
add_executable(example src/example.cpp src/main.cpp)
# No -UNDEBUG → assertions disabled in Release, enabled in Debug
```
**Benefits:**
- **Consistent test behavior**: Tests validate assertions in both Debug and Release builds
- **Production performance**: Production binaries maintain optimized release performance
- **Early error detection**: Catch assertion failures during CI/CD regardless of build configuration
- **Build type flexibility**: Can use Release builds for performance profiling while still testing assertions
### Code Generation
- **gperf** for perfect hash table generation
- **Build-time generation** of token lookup tables
- **Include generated headers** from build directory
---
## Style Enforcement
### Consistency
- Follow existing patterns in the codebase
- Use the same style for similar constructs
- Maintain consistency within each translation unit
### Tools
- **clang-format** configuration (when available)
- **Static analysis** tools for code quality
- **Address sanitizer** for memory safety testing
This style guide reflects the existing codebase patterns and should be followed for all new code contributions to maintain consistency and readability.