conflict-set/README.md

A data structure for optimistic concurrency control on ranges of bitwise-lexicographically-ordered keys.

Intended to replace FoundationDB's skip list.

Hardware for all benchmarks is a mac m1 2020.

# FoundationDB's benchmark

## Skip list

```
New conflict set: 1.957 sec
                  0.639 Mtransactions/sec
                  2.555 Mkeys/sec
Detect only:      1.845 sec
                  0.678 Mtransactions/sec
                  2.710 Mkeys/sec
Skiplist only:    1.263 sec
                  0.990 Mtransactions/sec
                  3.960 Mkeys/sec
Performance counters:
               Build: 0.0546
                 Add: 0.0563
              Detect: 1.84
              D.Sort: 0.412
           D.Combine: 0.0141
         D.CheckRead: 0.671
   D.CheckIntraBatch: 0.0068
        D.MergeWrite: 0.592
      D.RemoveBefore: 0.146
```

## Radix tree (this implementation)

```
New conflict set: 1.366 sec
                  0.915 Mtransactions/sec
                  3.660 Mkeys/sec
Detect only:      1.248 sec
                  1.002 Mtransactions/sec
                  4.007 Mkeys/sec
Skiplist only:    0.573 sec
                  2.182 Mtransactions/sec
                  8.730 Mkeys/sec
Performance counters:
               Build: 0.0594
                 Add: 0.0572
              Detect: 1.25
              D.Sort: 0.418
           D.Combine: 0.0149
         D.CheckRead: 0.232
   D.CheckIntraBatch: 0.0067
        D.MergeWrite: 0.341
      D.RemoveBefore: 0.232
```

# Our benchmark

## Skip list

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|              253.76 |        3,940,735.01 |    0.2% |      0.01 | `point reads`
|              270.83 |        3,692,307.69 |    0.2% |      0.01 | `prefix reads`
|              355.98 |        2,809,136.40 |    0.6% |      0.01 | `range reads`
|              455.77 |        2,194,104.53 |    0.3% |      0.01 | `point writes`
|              448.53 |        2,229,492.31 |    1.8% |      0.01 | `prefix writes`
|              248.34 |        4,026,737.54 |    1.4% |      0.02 | `range writes`
|              561.21 |        1,781,878.13 |    0.9% |      0.01 | `monotonic increasing point writes`
|          149,791.67 |            6,675.94 |    2.7% |      0.01 | `worst case for radix tree`

## Radix tree (this implementation)

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|               19.52 |       51,239,417.90 |    0.2% |      0.01 | `point reads`
|               56.74 |       17,623,200.20 |    1.0% |      0.01 | `prefix reads`
|              111.36 |        8,979,743.73 |    0.6% |      0.01 | `range reads`
|               28.63 |       34,931,089.16 |    0.2% |      0.01 | `point writes`
|               41.82 |       23,913,916.86 |    0.2% |      0.01 | `prefix writes`
|               48.75 |       20,512,820.51 |    0.8% |      0.01 | `range writes`
|               93.72 |       10,670,548.15 |    3.2% |      0.01 | `monotonic increasing point writes`
|        2,467,542.00 |              405.26 |    0.4% |      0.03 | `worst case for radix tree`

# "Real data" test

Point queries only, best of three runs. Gc ratio is the ratio of time spent doing garbage collection to time spent adding writes or doing garbage collection. Lower is better.

## skip list

```
Check: 11.3385 seconds, 329.718 MB/s, Add: 5.35612 seconds, 131.072 MB/s, Gc ratio: 45.7173%
```

## radix tree

```
Check: 2.48583 seconds, 1503.93 MB/s, Add: 2.12768 seconds, 329.954 MB/s, Gc ratio: 41.7943%
```

## hash table

(The hash table implementation doesn't work on range queries, and its purpose is to provide an idea of how fast point queries can be)

```
Check: 1.83386 seconds, 2038.6 MB/s, Add: 0.601411 seconds, 1167.32 MB/s, Gc ratio: 48.9776%
```