548 lines
18 KiB
Markdown
548 lines
18 KiB
Markdown
# WeaselDB C++ Style Guide
|
|
|
|
This document describes the C++ coding style used in the WeaselDB project. These conventions ensure consistency, readability, and maintainability across the codebase.
|
|
|
|
## Table of Contents
|
|
|
|
1. [General Principles](#general-principles)
|
|
2. [Naming Conventions](#naming-conventions)
|
|
3. [File Organization](#file-organization)
|
|
4. [Code Structure](#code-structure)
|
|
5. [Memory Management](#memory-management)
|
|
6. [Error Handling](#error-handling)
|
|
7. [Documentation](#documentation)
|
|
8. [Testing](#testing)
|
|
|
|
---
|
|
|
|
## General Principles
|
|
|
|
### Language Standard
|
|
- **C++20** is the target standard
|
|
- Use modern C++ features: RAII, move semantics, constexpr, concepts where appropriate
|
|
- Prefer standard library containers and algorithms over custom implementations
|
|
|
|
### Data Types
|
|
- **Almost always signed** - prefer `int`, `int64_t`, `size_t` over unsigned types except for:
|
|
- Bit manipulation operations
|
|
- Interfacing with APIs that require unsigned types
|
|
- Memory sizes where overflow is impossible (`size_t`, `uint32_t` for arena block sizes)
|
|
- Where defined unsigned overflow behavior (wraparound) is intentional and desired
|
|
- **Almost always auto** - let the compiler deduce types except when:
|
|
- The type is not obvious from context (prefer explicit for clarity)
|
|
- Specific type requirements matter (numeric conversions, template parameters)
|
|
- Interface contracts need explicit types (public APIs, function signatures)
|
|
- **Prefer uninitialized memory to default initialization** when using before initializing would be an error
|
|
- Valgrind will catch uninitialized memory usage bugs
|
|
- Avoid hiding logic errors with unnecessary zero-initialization
|
|
- Default initialization can mask bugs and hurt performance
|
|
- **Floating point is for metrics only** - avoid `float`/`double` in core data structures and algorithms
|
|
- Use for performance measurements, statistics, and monitoring data
|
|
- Never use for counts, sizes, or business logic
|
|
|
|
### Performance Focus
|
|
- **Performance-first design** - optimize for the hot path
|
|
- **Simple is fast** - find exactly what's necessary, strip away everything else
|
|
- **Complexity must be justified with benchmarks** - measure performance impact before adding complexity
|
|
- **Strive for 0% CPU usage when idle** - avoid polling, busy waiting, or unnecessary background activity
|
|
- Use **inline functions** for performance-critical code (e.g., `allocate_raw`)
|
|
- **Zero-copy operations** with `std::string_view` over string copying
|
|
- **Arena allocation** for efficient memory management (see Memory Management section for details)
|
|
|
|
### Complexity Control
|
|
- **Encapsulation is the main tool for controlling complexity**
|
|
- **Header files define the interface** - they are the contract with users of your code
|
|
- **Headers should be complete** - include everything needed to use the interface effectively:
|
|
- Usage examples in comments
|
|
- Preconditions and postconditions
|
|
- Thread safety guarantees
|
|
- Performance characteristics
|
|
- Ownership and lifetime semantics
|
|
- **Do not rely on undocumented interface properties** - if it's not in the header, don't depend on it
|
|
|
|
---
|
|
|
|
## Naming Conventions
|
|
|
|
### Variables and Functions
|
|
- **snake_case** for all variables, functions, and member functions
|
|
```cpp
|
|
size_t used_bytes() const;
|
|
void add_block(size_t size);
|
|
uint32_t initial_block_size_;
|
|
```
|
|
|
|
### Structs
|
|
- **PascalCase** for struct names
|
|
- **Always use struct** - eliminates debates about complexity and maintains consistency
|
|
- **Public members first, private after** - puts the interface users care about at the top, implementation details below
|
|
- **Full encapsulation still applies** - use `private:` sections to hide implementation details and maintain deep, capable classes
|
|
- The struct keyword doesn't mean shallow design - it means interface-first organization for human readers
|
|
```cpp
|
|
struct ArenaAllocator {
|
|
// Public interface first
|
|
explicit ArenaAllocator(size_t initial_size = 1024);
|
|
void* allocate_raw(size_t size);
|
|
|
|
private:
|
|
// Private members after
|
|
uint32_t initial_block_size_;
|
|
Block* current_block_;
|
|
};
|
|
```
|
|
|
|
### Enums
|
|
- **PascalCase** for enum class names
|
|
- **PascalCase** for enum values (not SCREAMING_SNAKE_CASE)
|
|
```cpp
|
|
enum class Type {
|
|
PointRead,
|
|
RangeRead
|
|
};
|
|
|
|
enum class ParseState {
|
|
Root,
|
|
PreconditionsArray,
|
|
OperationObject
|
|
};
|
|
```
|
|
|
|
### Constants and Macros
|
|
- **snake_case** for constants
|
|
- Avoid macros when possible; prefer `constexpr` variables
|
|
```cpp
|
|
static const WeaselJsonCallbacks json_callbacks;
|
|
```
|
|
|
|
### Member Variables
|
|
- **Trailing underscore** for private member variables
|
|
```cpp
|
|
private:
|
|
uint32_t initial_block_size_;
|
|
Block *current_block_;
|
|
```
|
|
|
|
### Template Parameters
|
|
- **PascalCase** for template type parameters
|
|
```cpp
|
|
template <typename T, typename... Args>
|
|
template <typename T> struct rebind { using type = T*; };
|
|
```
|
|
|
|
---
|
|
|
|
## File Organization
|
|
|
|
### Include Organization
|
|
- Use **`#pragma once`** instead of include guards
|
|
- **Never `using namespace std`** - always use fully qualified names for clarity and safety
|
|
- **Include order** (applies to both headers and source files):
|
|
1. Corresponding header file (for .cpp files only)
|
|
2. Standard library headers (alphabetical)
|
|
3. Third-party library headers
|
|
4. Project headers
|
|
```cpp
|
|
#pragma once
|
|
|
|
#include <algorithm>
|
|
#include <cstddef>
|
|
#include <cstdint>
|
|
#include <memory>
|
|
#include <string_view>
|
|
|
|
#include <simdutf.h>
|
|
#include <weaseljson/weaseljson.h>
|
|
|
|
#include "arena_allocator.hpp"
|
|
#include "commit_request.hpp"
|
|
|
|
// Never this:
|
|
// using namespace std;
|
|
|
|
// Always this:
|
|
std::vector<int> data;
|
|
std::unique_ptr<Parser> parser;
|
|
```
|
|
|
|
---
|
|
|
|
## Code Structure
|
|
|
|
### Struct Design
|
|
- **Move-only semantics** for resource-owning structs
|
|
- **Explicit constructors** to prevent implicit conversions
|
|
- **Delete copy operations** when inappropriate
|
|
```cpp
|
|
struct ArenaAllocator {
|
|
explicit ArenaAllocator(size_t initial_size = 1024);
|
|
|
|
// Copy construction is not allowed
|
|
ArenaAllocator(const ArenaAllocator &) = delete;
|
|
ArenaAllocator &operator=(const ArenaAllocator &) = delete;
|
|
|
|
// Move semantics
|
|
ArenaAllocator(ArenaAllocator &&other) noexcept;
|
|
ArenaAllocator &operator=(ArenaAllocator &&other) noexcept;
|
|
|
|
private:
|
|
uint32_t initial_block_size_;
|
|
Block *current_block_;
|
|
};
|
|
```
|
|
|
|
### Function Design
|
|
- **Const correctness** - mark methods const when appropriate
|
|
- **Parameter passing:**
|
|
- Pass by value for types ≤ 16 bytes (int, pointers, string_view, small structs)
|
|
- Pass by const reference for types > 16 bytes (containers, large objects)
|
|
- **Return by value** for small types (≤ 16 bytes), **string_view** for zero-copy over strings
|
|
- **noexcept specification** for move operations and non-throwing functions
|
|
```cpp
|
|
std::span<const Operation> operations() const { return operations_; }
|
|
void process_data(std::string_view data); // ≤ 16 bytes, pass by value
|
|
void process_request(const CommitRequest& req); // > 16 bytes, pass by reference
|
|
ArenaAllocator(ArenaAllocator &&other) noexcept;
|
|
```
|
|
|
|
### Template Usage
|
|
- **Template constraints** using static_assert for better error messages
|
|
- **SFINAE** or concepts for template specialization
|
|
|
|
### Factory Patterns & Ownership
|
|
- **Static factory methods** for complex construction requiring shared ownership
|
|
- **Friend-based factories** for access control when constructor should be private
|
|
- **Ownership guidelines:**
|
|
- **unique_ptr** for exclusive ownership (most common case)
|
|
- **shared_ptr** only when multiple owners need concurrent access to same object
|
|
- **Factory methods return appropriate smart pointer type** based on ownership needs
|
|
|
|
```cpp
|
|
// Shared ownership - multiple components need concurrent access
|
|
auto server = Server::create(config, handler); // Returns shared_ptr
|
|
|
|
// Exclusive ownership - single owner, transfer via move
|
|
auto connection = Connection::createForServer(...); // Returns unique_ptr
|
|
|
|
// Friend-based factory for access control
|
|
struct Connection {
|
|
void appendMessage(std::string_view data);
|
|
private:
|
|
Connection(/* args */); // Private constructor
|
|
friend struct Server; // Only Server can construct
|
|
};
|
|
```
|
|
|
|
### Control Flow
|
|
- **Early returns** to reduce nesting
|
|
- **Range-based for loops** when possible
|
|
```cpp
|
|
if (size == 0) {
|
|
return nullptr;
|
|
}
|
|
|
|
for (auto &precondition : preconditions_) {
|
|
// ...
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Memory Management
|
|
|
|
### Ownership & Allocation
|
|
- **Arena allocators** for request-scoped memory with **STL allocator adapters** (provides ~1ns allocation vs ~20-270ns for malloc)
|
|
- **String views** pointing to arena-allocated memory for zero-copy operations
|
|
- **STL containers with arena allocators require default construction after arena reset** - `clear()` is not sufficient
|
|
```cpp
|
|
// STL containers with arena allocators - correct reset pattern
|
|
std::vector<Operation, ArenaStlAllocator<Operation>> operations(arena_alloc);
|
|
// ... use container ...
|
|
operations = {}; // Default construct - clear() won't work correctly
|
|
arena.reset(); // Reset arena memory
|
|
```
|
|
|
|
### Resource Management
|
|
- **RAII** everywhere - constructors acquire, destructors release
|
|
- **Move semantics** for efficient resource transfer
|
|
- **Explicit cleanup** methods where appropriate
|
|
```cpp
|
|
~ArenaAllocator() {
|
|
while (current_block_) {
|
|
Block *prev = current_block_->prev;
|
|
std::free(current_block_);
|
|
current_block_ = prev;
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Error Handling
|
|
|
|
### Error Classification & Response
|
|
- **Expected errors** (invalid input, timeouts): Return error codes for programmatic handling
|
|
- **System failures** (malloc fail, socket fail): Abort immediately with error message
|
|
- **Programming errors** (precondition violations, assertions): Abort immediately
|
|
|
|
### Error Contract Design
|
|
- **Error codes are the API contract** - use enums for programmatic decisions
|
|
- **Error messages are human-readable only** - never parse message strings
|
|
- **Consistent error boundaries** - each component defines what it can/cannot recover from
|
|
- **Interface precondition violations are undefined behavior** - acceptable to skip checks for performance in hot paths
|
|
|
|
```cpp
|
|
enum class ParseResult { Success, InvalidJson, MissingField };
|
|
|
|
// System failure - abort immediately
|
|
void* memory = malloc(size);
|
|
if (!memory) {
|
|
std::fprintf(stderr, "ArenaAllocator: Memory allocation failed\n");
|
|
std::abort();
|
|
}
|
|
// ... use memory, eventually free it
|
|
|
|
// Programming error - precondition violation (may be omitted for performance)
|
|
assert(ptr != nullptr && "Precondition violated: pointer must be non-null");
|
|
```
|
|
|
|
### Assertions
|
|
- **Programming error detection** using standard `assert()` macro
|
|
- **Assertion behavior follows C++ standards:**
|
|
- **Debug builds**: Assertions active (undefined `NDEBUG`)
|
|
- **Release builds**: Assertions removed (defined `NDEBUG`)
|
|
- **Test targets override**: Use `-UNDEBUG` to force assertions active in all builds
|
|
- **Static assertions** for compile-time validation (always active)
|
|
|
|
**Usage guidelines:**
|
|
- Use for programming errors: null checks, precondition validation, invariants
|
|
- Don't use for expected runtime errors: use return codes instead
|
|
|
|
```cpp
|
|
// Good: Programming error checks
|
|
assert(current_block_ && "realloc called with non-null ptr but no current block");
|
|
assert(size > 0 && "Cannot allocate zero bytes");
|
|
|
|
// Good: Compile-time validation (always enabled)
|
|
static_assert(std::is_trivially_destructible_v<T>, "Arena requires trivially destructible types");
|
|
|
|
// Bad: Don't use assert for expected runtime errors
|
|
// assert(file_exists(path)); // File might legitimately not exist - use return code instead
|
|
```
|
|
|
|
### System Call Error Handling
|
|
|
|
When a system call is interrupted by a signal (`EINTR`), it is usually necessary to retry the call. This is especially true for "slow" system calls that can block for a long time, such as `read`, `write`, `accept`, `connect`, `sem_wait`, and `epoll_wait`.
|
|
|
|
**Rule:** Always wrap potentially interruptible system calls in a `do-while` loop that checks for `EINTR`.
|
|
|
|
**Example:**
|
|
|
|
```cpp
|
|
int fd;
|
|
do {
|
|
fd = accept(listen_fd, nullptr, nullptr);
|
|
} while (fd == -1 && errno == EINTR);
|
|
|
|
if (fd == -1) {
|
|
// Handle other errors
|
|
perror("accept");
|
|
abort();
|
|
}
|
|
```
|
|
|
|
**Special case - close():**
|
|
|
|
The `close()` system call is a special case on Linux. According to `man 2 close`, when `close()` returns `EINTR` on Linux, the file descriptor is still guaranteed to be closed. Therefore, `close()` should **never** be retried.
|
|
|
|
```cpp
|
|
// Correct: Do not retry close() on EINTR
|
|
int e = close(fd);
|
|
if (e == -1 && errno != EINTR) {
|
|
// Handle non-EINTR errors only
|
|
perror("close");
|
|
std::abort();
|
|
}
|
|
// Note: fd is guaranteed closed even on EINTR
|
|
```
|
|
|
|
**Non-interruptible calls:**
|
|
|
|
Most system calls are not interruptible in practice. For these, it is not necessary to add a retry loop. This includes:
|
|
|
|
* `fcntl` (with `F_GETFL`, `F_SETFL`, `F_GETFD`, `F_SETFD` - note: `F_SETLKW` and `F_OFD_SETLKW` CAN return EINTR)
|
|
* `epoll_ctl`
|
|
* `socketpair`
|
|
* `pipe`
|
|
* `setsockopt`
|
|
* `epoll_create1`
|
|
* `close` (special case: guaranteed closed even on EINTR on Linux)
|
|
|
|
When in doubt, consult the `man` page for the specific system call to see if it can return `EINTR`.
|
|
|
|
---
|
|
|
|
## Documentation
|
|
|
|
### Doxygen Style
|
|
- **/** for struct and public method documentation
|
|
- **@brief** for short descriptions
|
|
- **@param** and **@return** for function parameters
|
|
- **@note** for important implementation notes
|
|
- **@warning** for critical usage warnings
|
|
```cpp
|
|
/**
|
|
* @brief Type-safe version of realloc_raw for arrays of type T.
|
|
* @param ptr Pointer to the existing allocation
|
|
* @param old_size Size in number of T objects
|
|
* @param new_size Desired new size in number of T objects
|
|
* @return Pointer to reallocated memory
|
|
* @note Prints error to stderr and calls std::abort() if allocation fails
|
|
*/
|
|
template <typename T>
|
|
T *realloc(T *ptr, uint32_t old_size, uint32_t new_size);
|
|
```
|
|
|
|
### Code Comments
|
|
- **Explain why, not what** - code should be self-documenting
|
|
- **Performance notes** for optimization decisions
|
|
- **Thread safety** and ownership semantics
|
|
```cpp
|
|
// Uses O(1) accumulated counters for fast retrieval
|
|
size_t total_allocated() const;
|
|
|
|
// Only Server can create connections - no public constructor
|
|
Connection(struct sockaddr_storage addr, int fd, int64_t id,
|
|
ConnectionHandler *handler, std::weak_ptr<Server> server);
|
|
```
|
|
|
|
---
|
|
|
|
## Testing
|
|
|
|
### Test Framework
|
|
- **doctest** for unit testing
|
|
- **TEST_CASE** and **SUBCASE** for test organization
|
|
- **CHECK** for assertions (non-terminating)
|
|
- **REQUIRE** for critical assertions (terminating)
|
|
|
|
### Test Structure
|
|
- **Descriptive test names** explaining the scenario
|
|
- **SUBCASE** for related test variations
|
|
- **Fresh instances** for each test to avoid state contamination
|
|
```cpp
|
|
TEST_CASE("ArenaAllocator basic allocation") {
|
|
ArenaAllocator arena;
|
|
|
|
SUBCASE("allocate zero bytes returns nullptr") {
|
|
void *ptr = arena.allocate_raw(0);
|
|
CHECK(ptr == nullptr);
|
|
}
|
|
|
|
SUBCASE("allocate single byte") {
|
|
void *ptr = arena.allocate_raw(1);
|
|
CHECK(ptr != nullptr);
|
|
CHECK(arena.used_bytes() >= 1);
|
|
}
|
|
}
|
|
```
|
|
|
|
### Test Design Principles
|
|
- **Prefer testing through public interfaces** - focus on observable behavior rather than implementation details
|
|
- **Test the contract, not the implementation** - validate what the API promises to deliver
|
|
- **Avoid testing private methods directly** - if private functionality needs testing, consider if it should be public or extracted
|
|
- **Both integration and unit tests** - test components in isolation and working together
|
|
- **Prefer fakes to mocks** - use real implementations for internal components, fake external dependencies
|
|
- **Always enable assertions in tests** - use `-UNDEBUG` pattern to ensure assertions are checked (see Build Integration section)
|
|
```cpp
|
|
// Good: Testing through public API
|
|
TEST_CASE("Server accepts connections") {
|
|
auto config = Config::defaultConfig();
|
|
auto handler = std::make_unique<TestHandler>();
|
|
auto server = Server::create(config, std::move(handler));
|
|
|
|
// Test observable behavior - server can accept connections
|
|
auto result = connectToServer(server->getPort());
|
|
CHECK(result.connected);
|
|
}
|
|
|
|
// Avoid: Testing internal implementation details
|
|
// TEST_CASE("Server creates epoll instance") { /* implementation detail */ }
|
|
```
|
|
|
|
### Test Synchronization (Authoritative Rules)
|
|
- **ABSOLUTELY NEVER use timeouts** (`sleep_for`, `wait_for`, etc.)
|
|
- **Deterministic synchronization only:**
|
|
- Blocking I/O (naturally waits for completion)
|
|
- `condition_variable.wait()` without timeout
|
|
- `std::latch`, `std::barrier`, futures/promises
|
|
- **Force concurrent execution** using `std::latch` to synchronize thread startup
|
|
- **Tests should pass or hang** - no flaky timeout behavior
|
|
|
|
```cpp
|
|
// BAD: Race likely over before threads start
|
|
std::atomic<int> counter{0};
|
|
for (int i = 0; i < 4; ++i) {
|
|
threads.emplace_back([&]() { counter++; }); // Probably sequential
|
|
}
|
|
|
|
// GOOD: Force threads to race simultaneously
|
|
std::atomic<int> counter{0};
|
|
std::latch start_latch{4};
|
|
for (int i = 0; i < 4; ++i) {
|
|
threads.emplace_back([&]() {
|
|
start_latch.count_down_and_wait(); // All threads start together
|
|
counter++; // Now they actually race
|
|
});
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Build Integration
|
|
|
|
### Build Configuration
|
|
```bash
|
|
# Debug: assertions on, optimizations off
|
|
cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Debug -DCMAKE_EXPORT_COMPILE_COMMANDS=ON
|
|
|
|
# Release: assertions off, optimizations on
|
|
cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Release -DCMAKE_EXPORT_COMPILE_COMMANDS=ON
|
|
```
|
|
|
|
**Test Target Pattern:**
|
|
- Production targets follow build type (assertions off in Release)
|
|
- Test targets use `-UNDEBUG` to force assertions on in all builds
|
|
- Ensures consistent test validation regardless of build type
|
|
|
|
```cmake
|
|
# Test target with assertions always enabled
|
|
add_executable(test_example tests/test_example.cpp src/example.cpp)
|
|
target_link_libraries(test_example doctest::doctest)
|
|
target_compile_options(test_example PRIVATE -UNDEBUG) # Always enable assertions
|
|
|
|
# Production target follows build type
|
|
add_executable(example src/example.cpp src/main.cpp)
|
|
# No -UNDEBUG → assertions disabled in Release, enabled in Debug
|
|
```
|
|
|
|
### Code Generation
|
|
- **gperf** for perfect hash table generation
|
|
- **Build-time generation** of token lookup tables
|
|
- **Include generated headers** from build directory
|
|
|
|
---
|
|
|
|
## Style Enforcement
|
|
|
|
### Consistency
|
|
- Follow existing patterns in the codebase
|
|
- Use the same style for similar constructs
|
|
- Maintain consistency within each translation unit
|
|
|
|
### Tools
|
|
- **clang-format** configuration (when available)
|
|
- **Static analysis** tools for code quality
|
|
- **Address sanitizer** for memory safety testing
|
|
|
|
This style guide reflects the existing codebase patterns and should be followed for all new code contributions to maintain consistency and readability.
|