ee36bda8f83e0754ea0417bc0594ffcc3a100c1f
By not initializing Node members with dummy default values. This has performance/code size benefits, and improves debugging when running under valgrind. Unfortunately this also makes it easy to write code that uses uninitialized memory, so if valgrind doesn't have good coverage then we might let some uninit usages sneak through. We plan to have good coverage for valgrind, so I think it's ok. If writing correct code becomes too tedious then we can go back to initializing Node fields with dummy default values.
A data structure for optimistic concurrency control on ranges of bitwise-lexicographically-ordered keys.
Intended to replace FoundationDB's skip list.
FoundationDB's benchmark
Skip list
New conflict set: 2.404 sec
0.520 Mtransactions/sec
2.080 Mkeys/sec
Detect only: 2.266 sec
0.552 Mtransactions/sec
2.207 Mkeys/sec
Skiplist only: 1.594 sec
0.784 Mtransactions/sec
3.137 Mkeys/sec
Performance counters:
Build: 0.071
Add: 0.0641
Detect: 2.27
D.Sort: 0.44
D.Combine: 0.018
D.CheckRead: 0.855
D.CheckIntraBatch: 0.00903
D.MergeWrite: 0.739
D.RemoveBefore: 0.201
Radix tree (this implementation)
New conflict set: 1.743 sec
0.717 Mtransactions/sec
2.869 Mkeys/sec
Detect only: 1.611 sec
0.776 Mtransactions/sec
3.103 Mkeys/sec
Skiplist only: 0.919 sec
1.360 Mtransactions/sec
5.440 Mkeys/sec
Performance counters:
Build: 0.0657
Add: 0.0628
Detect: 1.61
D.Sort: 0.442
D.Combine: 0.0178
D.CheckRead: 0.395
D.CheckIntraBatch: 0.00776
D.MergeWrite: 0.524
D.RemoveBefore: 0.221
Our benchmark
Skip list
ns/op | op/s | err% | total | benchmark |
---|---|---|---|---|
270.07 | 3,702,706.03 | 0.4% | 0.01 | point reads |
285.76 | 3,499,437.03 | 1.5% | 0.01 | prefix reads |
532.54 | 1,877,794.90 | 0.7% | 0.01 | range reads |
528.50 | 1,892,132.94 | 0.7% | 0.01 | point writes |
516.53 | 1,935,978.22 | 0.9% | 0.01 | prefix writes |
303.34 | 3,296,630.84 | 3.6% | 0.05 | range writes |
502.88 | 1,988,553.24 | 2.0% | 0.01 | monotonic increasing point writes |
Radix tree (this implementation)
ns/op | op/s | err% | total | benchmark |
---|---|---|---|---|
14.52 | 68,850,842.99 | 1.2% | 0.01 | point reads |
60.89 | 16,422,538.22 | 1.5% | 0.01 | prefix reads |
226.89 | 4,407,362.98 | 0.5% | 0.01 | range reads |
22.99 | 43,498,198.49 | 0.2% | 0.01 | point writes |
50.51 | 19,799,864.54 | 1.0% | 0.01 | prefix writes |
82.50 | 12,121,212.12 | 2.6% | 0.03 | range writes |
119.94 | 8,337,354.54 | 2.1% | 0.01 | monotonic increasing point writes |
"Real data" test
Point queries only, best of three runs. Gc ratio is the ratio of time spent doing garbage collection to time spent adding writes or doing garbage collection. Lower is better.
skip list
Check: 12.7863 seconds, 292.384 MB/s, Add: 19.8276 seconds, 35.4071 MB/s, Gc ratio: 23.5314%
radix tree
Check: 3.60187 seconds, 1037.94 MB/s, Add: 3.03958 seconds, 230.966 MB/s, Gc ratio: 52.3876%
hash table
(The hash table implementation doesn't work on range queries, and its purpose is to provide an idea of how fast point queries can be)
Check: 2.15925 seconds, 1731.4 MB/s, Add: 1.08519 seconds, 646.926 MB/s, Gc ratio: 52.1526%
v0.0.13
Latest
Languages
C++
82.9%
TeX
8.1%
CMake
4.8%
Python
2.7%
Shell
0.9%
Other
0.6%