|
|
|
|
@@ -1,4 +1,4 @@
|
|
|
|
|
# WeaselJSON: A Streaming JSON Parser Review
|
|
|
|
|
# WeaselJSON: A Streaming JSON Parser
|
|
|
|
|
|
|
|
|
|
## What is WeaselJSON?
|
|
|
|
|
|
|
|
|
|
@@ -17,28 +17,20 @@ WeaselJSON is a high-performance, streaming JSON parser that uses callbacks inst
|
|
|
|
|
- The callback API is a pain to use correctly
|
|
|
|
|
- Requires a lot of boilerplate for simple tasks
|
|
|
|
|
- Most people don't actually need streaming JSON parsing
|
|
|
|
|
- Some features only work if you control how the JSON is structured
|
|
|
|
|
|
|
|
|
|
## JSON Parser Decision Guide
|
|
|
|
|
|
|
|
|
|
```mermaid
|
|
|
|
|
flowchart TD
|
|
|
|
|
A[Need to parse JSON?] --> B{Do you have memory constraints<br/>or truly streaming data?}
|
|
|
|
|
A[Need to parse JSON?] --> B{Memory or size constraints?}
|
|
|
|
|
|
|
|
|
|
B -->|No| C{Is performance critical<br/>and data fits in memory?}
|
|
|
|
|
B -->|Yes| D{Can you control<br/>how the JSON is laid out?}
|
|
|
|
|
B -->|Yes| C[WeaselJSON<br/>Streaming parser]
|
|
|
|
|
|
|
|
|
|
C -->|No| E[SimdJSON DOM API<br/>Fast and easy to use<br/>Wait why are you using C++?]
|
|
|
|
|
C -->|Yes| F{Are you OK with potential<br/>performance traps?}
|
|
|
|
|
B -->|No| D{Need maximum speed?}
|
|
|
|
|
|
|
|
|
|
F -->|Yes| G[SimdJSON On-Demand<br/>Very fast<br/>Nice API<br/>Easy to use wrong]
|
|
|
|
|
F -->|No| H[Consider weaseljson<br/>if you can deal with callbacks]
|
|
|
|
|
D -->|No| E[SimdJSON DOM<br/>Easy to use]
|
|
|
|
|
|
|
|
|
|
D -->|No| I{Can you preprocess data<br/>or make multiple requests?}
|
|
|
|
|
D -->|Yes| J[WeaselJSON<br/>Streaming performance<br/>Constant memory usage<br/>Harder to use]
|
|
|
|
|
|
|
|
|
|
I -->|No| K[Use SimdJSON DOM<br/>Deal with the tradeoffs]
|
|
|
|
|
I -->|Yes| L[WeaselJSON with<br/>data preprocessing]
|
|
|
|
|
D -->|Yes| F[SimdJSON On-Demand<br/>Fastest option]
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## When to Use What
|
|
|
|
|
@@ -47,18 +39,19 @@ flowchart TD
|
|
|
|
|
- You want to write `obj["key"]` and have it just work
|
|
|
|
|
- JSON files are reasonably sized (but you still want decent performance)
|
|
|
|
|
- You care about getting things done
|
|
|
|
|
- You need full document validation upfront
|
|
|
|
|
|
|
|
|
|
### Use **SimdJSON On-Demand** when:
|
|
|
|
|
- Performance is critical and your data fits in memory
|
|
|
|
|
- You understand that some access patterns can accidentally become very slow
|
|
|
|
|
- You can work with forward-only traversal (no random access or backtracking)
|
|
|
|
|
- You need maximum speed but still want a usable API
|
|
|
|
|
- You're okay with partial validation (only validates parts you actually access)
|
|
|
|
|
- Your JSON keys don't contain escape sequences (OnDemand matches raw keys without unescaping)
|
|
|
|
|
|
|
|
|
|
### Use **WeaselJSON** when:
|
|
|
|
|
- You absolutely cannot load the whole JSON into memory
|
|
|
|
|
- You're processing huge JSON files (multiple gigabytes)
|
|
|
|
|
- You want to parse just part of a JSON file without reading the rest
|
|
|
|
|
- You need to convert JSON into some other format as you parse it
|
|
|
|
|
- You can't risk accidentally slow performance
|
|
|
|
|
- You need predictable performance characteristics
|
|
|
|
|
- Writing stateful callbacks is your idea of fun
|
|
|
|
|
|
|
|
|
|
## The Reality
|
|
|
|
|
@@ -67,8 +60,6 @@ WeaselJSON solves real problems that other parsers can't handle, but most people
|
|
|
|
|
|
|
|
|
|
The parser represents excellent engineering work and occupies a useful niche. It's the kind of tool you're very glad exists when you actually need it, but most people will never need it.
|
|
|
|
|
|
|
|
|
|
**Note**: WeaselJSON assumes modern CPU features (SIMD), so you probably can't use it for embedded development.
|
|
|
|
|
|
|
|
|
|
## Technical Reference
|
|
|
|
|
|
|
|
|
|
### Features
|
|
|
|
|
@@ -76,7 +67,7 @@ The parser represents excellent engineering work and occupies a useful niche. It
|
|
|
|
|
- No memory allocations during parsing
|
|
|
|
|
- O(1) memory usage regardless of input size
|
|
|
|
|
- Streaming API - no need to buffer the entire document in memory. Parsing is resumed when more data is available
|
|
|
|
|
- Strings are unescaped in place before they're presented. No unicode normalization is performed
|
|
|
|
|
- By default, strings are unescaped in place before they're presented (modifies your input buffer). No unicode normalization is performed
|
|
|
|
|
- Robust to crashes with untrusted input
|
|
|
|
|
- SIMD optimizations for string scanning and validation
|
|
|
|
|
|
|
|
|
|
|