Add mdformat pre-commit hook

This commit is contained in:
2025-09-12 11:24:16 -04:00
parent 9d48caca76
commit bf90b8856a
9 changed files with 286 additions and 120 deletions

112
style.md
View File

@@ -5,28 +5,31 @@ This document describes the C++ coding style used in the WeaselDB project. These
## Table of Contents
1. [General Principles](#general-principles)
2. [Naming Conventions](#naming-conventions)
3. [File Organization](#file-organization)
4. [Code Structure](#code-structure)
5. [Memory Management](#memory-management)
6. [Error Handling](#error-handling)
7. [Documentation](#documentation)
8. [Testing](#testing)
1. [Naming Conventions](#naming-conventions)
1. [File Organization](#file-organization)
1. [Code Structure](#code-structure)
1. [Memory Management](#memory-management)
1. [Error Handling](#error-handling)
1. [Documentation](#documentation)
1. [Testing](#testing)
---
______________________________________________________________________
## General Principles
### Language Standard
- **C++20** is the target standard
- Use modern C++ features: RAII, move semantics, constexpr, concepts where appropriate
- Prefer standard library containers and algorithms over custom implementations
### C Library Functions and Headers
- **Always use std:: prefixed versions** of C library functions for consistency and clarity
- **Use C++ style headers** (`<cstring>`, `<cstdlib>`, etc.) instead of C style headers (`<string.h>`, `<stdlib.h>`, etc.)
- This applies to all standard libc functions: `std::abort()`, `std::fprintf()`, `std::free()`, `std::memcpy()`, `std::strlen()`, `std::strncpy()`, `std::memset()`, `std::signal()`, etc.
- **Exception:** Functions with no std:: equivalent (e.g., `perror()`, `gai_strerror()`) and system-specific headers (e.g., `<unistd.h>`, `<fcntl.h>`)
```cpp
// Preferred - C++ style
#include <cstring>
@@ -56,6 +59,7 @@ signal(SIGTERM, handler);
```
### Data Types
- **Almost always signed** - prefer `int`, `int64_t`, `ssize_t` over unsigned types except for:
- Bit manipulation operations
- Interfacing with APIs that require unsigned types
@@ -73,6 +77,7 @@ signal(SIGTERM, handler);
- Never use for counts, sizes, or business logic
### Type Casting
- **Never use C-style casts** - they're unsafe and can hide bugs by performing dangerous conversions
- **Use C++ cast operators** for explicit type conversions with clear intent and safety checks
- **Avoid `reinterpret_cast`** - almost always indicates poor design; redesign APIs instead
@@ -94,6 +99,7 @@ auto addr = reinterpret_cast<uintptr_t>(ptr); // Pointer to integer conv
```
### Performance Focus
- **Performance-first design** - optimize for the hot path
- **Simple is fast** - find exactly what's necessary, strip away everything else
- **Complexity must be justified with benchmarks** - measure performance impact before adding complexity
@@ -103,9 +109,11 @@ auto addr = reinterpret_cast<uintptr_t>(ptr); // Pointer to integer conv
- **Arena allocation** for efficient memory management (~1ns vs ~20-270ns for malloc)
### String Formatting
- **Always use `format.hpp` functions** - formats directly into arena-allocated memory
- **Use `static_format()` for performance-sensitive code** - faster but less flexible than `format()`
- **Use `format()` function with arena allocator** for printf-style formatting
```cpp
// Most performance-sensitive - compile-time optimized concatenation
std::string_view response = static_format(arena,
@@ -124,6 +132,7 @@ std::string_view response = format(arena,
```
### Complexity Control
- **Encapsulation is the main tool for controlling complexity**
- **Header files define the interface** - they are the contract with users of your code
- **Headers should be complete** - include everything needed to use the interface effectively:
@@ -134,13 +143,15 @@ std::string_view response = format(arena,
- Ownership and lifetime semantics
- **Do not rely on undocumented interface properties** - if it's not in the header, don't depend on it
---
______________________________________________________________________
## Naming Conventions
### Variables and Functions
- **snake_case** for all variables, functions, and member functions
- **Legacy camelCase exists** - the codebase currently contains mixed naming due to historical development. New code should use snake_case. Existing camelCase should be converted to snake_case during natural refactoring (not mass renaming).
```cpp
int64_t used_bytes() const;
void add_block(int64_t size);
@@ -148,11 +159,13 @@ int32_t initial_block_size_;
```
### Classes and Structs
- **PascalCase** for class/struct names
- **Always use struct keyword** - eliminates debates about complexity and maintains consistency
- **Public members first, private after** - puts the interface users care about at the top, implementation details below
- **Full encapsulation still applies** - use `private:` sections to hide implementation details and maintain deep, capable structs
- The struct keyword doesn't mean shallow design - it means interface-first organization for human readers
```cpp
struct Arena {
// Public interface first
@@ -167,8 +180,10 @@ private:
```
### Enums
- **PascalCase** for enum class names
- **PascalCase** for enum values (not SCREAMING_SNAKE_CASE)
```cpp
enum class Type {
PointRead,
@@ -183,14 +198,18 @@ enum class ParseState {
```
### Constants and Macros
- **snake_case** for constants
- Avoid macros when possible; prefer `constexpr` variables
```cpp
static const WeaselJsonCallbacks json_callbacks;
```
### Member Variables
- **Trailing underscore** for private member variables
```cpp
private:
int32_t initial_block_size_;
@@ -198,24 +217,28 @@ private:
```
### Template Parameters
- **PascalCase** for template type parameters
```cpp
template <typename T, typename... Args>
template <typename T> struct rebind { using type = T*; };
```
---
______________________________________________________________________
## File Organization
### Include Organization
- Use **`#pragma once`** instead of include guards
- **Never `using namespace std`** - always use fully qualified names for clarity and safety
- **Include order** (applies to both headers and source files):
1. Corresponding header file (for .cpp files only)
2. Standard library headers (alphabetical)
3. Third-party library headers
4. Project headers
1. Standard library headers (alphabetical)
1. Third-party library headers
1. Project headers
```cpp
#pragma once
@@ -239,14 +262,16 @@ std::vector<int> data;
std::unique_ptr<Parser> parser;
```
---
______________________________________________________________________
## Code Structure
### Class Design
- **Move-only semantics** for resource-owning types
- **Explicit constructors** to prevent implicit conversions
- **Delete copy operations** when inappropriate
```cpp
struct Arena {
explicit Arena(int64_t initial_size = 1024);
@@ -266,12 +291,14 @@ private:
```
### Function Design
- **Const correctness** - mark methods const when appropriate
- **Parameter passing:**
- Pass by value for types ≤ 16 bytes (int, pointers, string_view, small structs)
- Pass by const reference for types > 16 bytes (containers, large objects)
- **Return by value** for small types (≤ 16 bytes), **string_view** to avoid copying strings
- **noexcept specification** for move operations and non-throwing functions
```cpp
std::span<const Operation> operations() const { return operations_; }
void process_data(std::string_view request_data); // ≤ 16 bytes, pass by value
@@ -280,10 +307,12 @@ Arena(Arena &&source) noexcept;
```
### Template Usage
- **Template constraints** using static_assert for better error messages
- **SFINAE** or concepts for template specialization
### Factory Patterns & Ownership
- **Static factory methods** for complex construction requiring shared ownership
- **Friend-based factories** for access control when constructor should be private
- **Ownership guidelines:**
@@ -310,8 +339,10 @@ private:
```
### Control Flow
- **Early returns** to reduce nesting
- **Range-based for loops** when possible
```cpp
if (size == 0) {
return nullptr;
@@ -323,9 +354,11 @@ for (auto &precondition : preconditions_) {
```
### Atomic Operations
- **Never use assignment operators** with `std::atomic` - always use explicit `store()` and `load()`
- **Always specify memory ordering** explicitly for atomic operations
- **Use the least restrictive correct memory ordering** - choose the weakest ordering that maintains correctness
```cpp
// Preferred - explicit store/load with precise memory ordering
std::atomic<uint64_t> counter;
@@ -343,14 +376,16 @@ counter = 42; // Implicit - memory ordering not explicit
auto value = counter; // Implicit - memory ordering not explicit
```
---
______________________________________________________________________
## Memory Management
### Ownership & Allocation
- **Arena allocators** for request-scoped memory with **STL allocator adapters** (see Performance Focus section for characteristics)
- **String views** pointing to arena-allocated memory to avoid unnecessary copying
- **STL containers with arena allocators require default construction after arena reset** - `clear()` is not sufficient
```cpp
// STL containers with arena allocators - correct reset pattern
std::vector<Operation, ArenaStlAllocator<Operation>> operations(arena);
@@ -360,9 +395,11 @@ arena.reset(); // Reset arena memory
```
### Resource Management
- **RAII** everywhere - constructors acquire, destructors release
- **Move semantics** for efficient resource transfer
- **Explicit cleanup** methods where appropriate
```cpp
~Arena() {
while (current_block_) {
@@ -373,16 +410,18 @@ arena.reset(); // Reset arena memory
}
```
---
______________________________________________________________________
## Error Handling
### Error Classification & Response
- **Expected errors** (invalid input, timeouts): Return error codes for programmatic handling
- **System failures** (malloc fail, socket fail): Abort immediately with error message
- **Programming errors** (precondition violations, assertions): Abort immediately
### Error Contract Design
- **Error codes are the API contract** - use enums for programmatic decisions
- **Error messages are human-readable only** - never parse message strings
- **Consistent error boundaries** - each component defines what it can/cannot recover from
@@ -405,6 +444,7 @@ assert(ptr != nullptr && "Precondition violated: pointer must be non-null");
```
### Assertions
- **Programming error detection** using standard `assert()` macro
- **Assertion behavior follows C++ standards:**
- **Debug builds**: Assertions active (undefined `NDEBUG`)
@@ -413,6 +453,7 @@ assert(ptr != nullptr && "Precondition violated: pointer must be non-null");
- **Static assertions** for compile-time validation (always active)
**Usage guidelines:**
- Use for programming errors: null checks, precondition validation, invariants
- Don't use for expected runtime errors: use return codes instead
@@ -468,26 +509,28 @@ if (result == -1 && errno != EINTR) {
Most system calls are not interruptible in practice. For these, it is not necessary to add a retry loop. This includes:
* `fcntl` (with `F_GETFL`, `F_SETFL`, `F_GETFD`, `F_SETFD` - note: `F_SETLKW` and `F_OFD_SETLKW` CAN return EINTR)
* `epoll_ctl`
* `socketpair`
* `pipe`
* `setsockopt`
* `epoll_create1`
* `close` (special case: guaranteed closed even on EINTR on Linux)
- `fcntl` (with `F_GETFL`, `F_SETFL`, `F_GETFD`, `F_SETFD` - note: `F_SETLKW` and `F_OFD_SETLKW` CAN return EINTR)
- `epoll_ctl`
- `socketpair`
- `pipe`
- `setsockopt`
- `epoll_create1`
- `close` (special case: guaranteed closed even on EINTR on Linux)
When in doubt, consult the `man` page for the specific system call to see if it can return `EINTR`.
---
______________________________________________________________________
## Documentation
### Doxygen Style
- **/** for struct and public method documentation
- **@brief** for short descriptions
- **@param** and **@return** for function parameters
- **@note** for important implementation notes
- **@warning** for critical usage warnings
```cpp
/**
* @brief Type-safe version of realloc_raw for arrays of type T.
@@ -502,9 +545,11 @@ T *realloc(T *existing_ptr, int32_t current_size, int32_t requested_size);
```
### Code Comments
- **Explain why, not what** - code should be self-documenting
- **Performance notes** for optimization decisions
- **Thread safety** and ownership semantics
```cpp
// Uses O(1) accumulated counters for fast retrieval
int64_t total_allocated() const;
@@ -514,20 +559,23 @@ Connection(struct sockaddr_storage addr, int fd, int64_t id,
ConnectionHandler *handler, std::weak_ptr<Server> server);
```
---
______________________________________________________________________
## Testing
### Test Framework
- **doctest** for unit testing
- **TEST_CASE** and **SUBCASE** for test organization
- **CHECK** for assertions (non-terminating)
- **REQUIRE** for critical assertions (terminating)
### Test Structure
- **Descriptive test names** explaining the scenario
- **SUBCASE** for related test variations
- **Fresh instances** for each test to avoid state contamination
```cpp
TEST_CASE("Arena basic allocation") {
Arena arena;
@@ -546,10 +594,12 @@ TEST_CASE("Arena basic allocation") {
```
### Test Design Principles
- **Test the contract, not the implementation** - validate what the API promises to deliver, not implementation details
- **Both integration and unit tests** - test components in isolation and working together
- **Prefer fakes to mocks** - use real implementations for internal components, fake external dependencies
- **Always enable assertions in tests** - use `-UNDEBUG` pattern to ensure assertions are checked (see Build Integration section)
```cpp
// Good: Testing through public API
TEST_CASE("Server accepts connections") {
@@ -569,11 +619,13 @@ TEST_CASE("Server accepts connections") {
### What NOT to Test
**Avoid testing language features and plumbing:**
- Don't test that virtual functions dispatch correctly
- Don't test that standard library types work (unique_ptr, containers, etc.)
- Don't test basic constructor/destructor calls
**Test business logic instead:**
- When does your code call hooks/callbacks and why?
- What state transitions trigger behavior changes?
- How does your code handle error conditions?
@@ -582,6 +634,7 @@ TEST_CASE("Server accepts connections") {
**Ask: "Am I testing the C++ compiler or my application logic?"**
### Test Synchronization (Authoritative Rules)
- **ABSOLUTELY NEVER use timeouts** (`sleep_for`, `wait_for`, etc.)
- **Deterministic synchronization only:**
- Blocking I/O (naturally waits for completion)
@@ -592,6 +645,7 @@ TEST_CASE("Server accepts connections") {
#### Threading Checklist for Tests/Benchmarks
**Common threading principles (all concurrent code):**
- **Count total threads** - Include main/benchmark thread in count
- **Always assume concurrent execution needed** - Tests/benchmarks require real concurrency
- **Add synchronization primitive** - `std::latch start_latch{N}` (most common), `std::barrier`, or similar where N = total concurrent threads
@@ -599,18 +653,21 @@ TEST_CASE("Server accepts connections") {
- **Main thread synchronizes before measurement/execution** - ensures all threads start simultaneously
**Test-specific:**
- **Perform many operations per thread creation** - amortize thread creation cost and increase chances of hitting race conditions
- **Pattern: Create test that spawns threads and runs many operations, then run that test many times** - amortizes thread creation cost while providing fresh test instances
- **Run 100-10000 operations per test, and 100-10000 test iterations** - maximizes chances of hitting race conditions
- **Always run with ThreadSanitizer** - compile with `-fsanitize=thread`
**Benchmark-specific:**
- **NEVER create threads inside the benchmark measurement** - creates thread creation/destruction overhead, not contention
- **Create background threads OUTSIDE the benchmark** that run continuously during measurement
- **Use `std::atomic<bool> keep_running` to cleanly shut down background threads after benchmark**
- **Measure only the foreground operation under real contention from background threads**
**Red flags to catch immediately:**
- ❌ Creating threads in a loop without `std::latch`
- ❌ Background threads starting work immediately
- ❌ Benchmark measuring before all threads synchronized
@@ -636,11 +693,12 @@ for (int i = 0; i < 4; ++i) {
}
```
---
______________________________________________________________________
## Build Integration
### Build Configuration
```bash
# Debug: assertions on, optimizations off
cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Debug -DCMAKE_EXPORT_COMPILE_COMMANDS=ON
@@ -650,6 +708,7 @@ cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Release -DCMAKE_EXPORT_COMPILE_COMMANDS=ON
```
**Test Target Pattern:**
- Production targets follow build type (assertions off in Release)
- Test targets use `-UNDEBUG` to force assertions on in all builds
- Ensures consistent test validation regardless of build type
@@ -666,4 +725,5 @@ add_executable(example src/example.cpp src/main.cpp)
```
### Code Generation
- Generated files go in build directory, not source