Compare commits

..

8 Commits

Author SHA1 Message Date
995ddf329f Fixes for gcc 15 2025-11-06 12:49:38 -05:00
5fc823b392 This didn't belong here and isn't really necessary 2025-08-25 16:50:43 -04:00
490b49e55d Remove redundancy 2025-08-25 16:49:21 -04:00
6fc263074e Simplify flowchart 2025-08-25 16:46:46 -04:00
befb464619 More README tinkers 2025-08-25 16:39:01 -04:00
f2bb72b3dc More clarifications in README 2025-08-25 16:19:34 -04:00
81814fa590 Simplify flow chart 2025-08-25 16:00:58 -04:00
611bccfb9b Update README 2025-08-25 15:57:07 -04:00
4 changed files with 18 additions and 25 deletions

View File

@@ -1,4 +1,4 @@
# WeaselJSON: A Streaming JSON Parser Review
# WeaselJSON: A Streaming JSON Parser
## What is WeaselJSON?
@@ -17,28 +17,20 @@ WeaselJSON is a high-performance, streaming JSON parser that uses callbacks inst
- The callback API is a pain to use correctly
- Requires a lot of boilerplate for simple tasks
- Most people don't actually need streaming JSON parsing
- Some features only work if you control how the JSON is structured
## JSON Parser Decision Guide
```mermaid
flowchart TD
A[Need to parse JSON?] --> B{Do you have memory constraints<br/>or truly streaming data?}
A[Need to parse JSON?] --> B{Memory or size constraints?}
B -->|No| C{Is performance critical<br/>and data fits in memory?}
B -->|Yes| D{Can you control<br/>how the JSON is laid out?}
B -->|Yes| C[WeaselJSON<br/>Streaming parser]
C -->|No| E[SimdJSON DOM API<br/>Fast and easy to use<br/>Wait why are you using C++?]
C -->|Yes| F{Are you OK with potential<br/>performance traps?}
B -->|No| D{Need maximum speed?}
F -->|Yes| G[SimdJSON On-Demand<br/>Very fast<br/>Nice API<br/>Easy to use wrong]
F -->|No| H[Consider weaseljson<br/>if you can deal with callbacks]
D -->|No| E[SimdJSON DOM<br/>Easy to use]
D -->|No| I{Can you preprocess data<br/>or make multiple requests?}
D -->|Yes| J[WeaselJSON<br/>Streaming performance<br/>Constant memory usage<br/>Harder to use]
I -->|No| K[Use SimdJSON DOM<br/>Deal with the tradeoffs]
I -->|Yes| L[WeaselJSON with<br/>data preprocessing]
D -->|Yes| F[SimdJSON On-Demand<br/>Fastest option]
```
## When to Use What
@@ -47,18 +39,19 @@ flowchart TD
- You want to write `obj["key"]` and have it just work
- JSON files are reasonably sized (but you still want decent performance)
- You care about getting things done
- You need full document validation upfront
### Use **SimdJSON On-Demand** when:
- Performance is critical and your data fits in memory
- You understand that some access patterns can accidentally become very slow
- You can work with forward-only traversal (no random access or backtracking)
- You need maximum speed but still want a usable API
- You're okay with partial validation (only validates parts you actually access)
- Your JSON keys don't contain escape sequences (OnDemand matches raw keys without unescaping)
### Use **WeaselJSON** when:
- You absolutely cannot load the whole JSON into memory
- You're processing huge JSON files (multiple gigabytes)
- You want to parse just part of a JSON file without reading the rest
- You need to convert JSON into some other format as you parse it
- You can't risk accidentally slow performance
- You need predictable performance characteristics
- Writing stateful callbacks is your idea of fun
## The Reality
@@ -67,8 +60,6 @@ WeaselJSON solves real problems that other parsers can't handle, but most people
The parser represents excellent engineering work and occupies a useful niche. It's the kind of tool you're very glad exists when you actually need it, but most people will never need it.
**Note**: WeaselJSON assumes modern CPU features (SIMD), so you probably can't use it for embedded development.
## Technical Reference
### Features
@@ -76,7 +67,7 @@ The parser represents excellent engineering work and occupies a useful niche. It
- No memory allocations during parsing
- O(1) memory usage regardless of input size
- Streaming API - no need to buffer the entire document in memory. Parsing is resumed when more data is available
- Strings are unescaped in place before they're presented. No unicode normalization is performed
- By default, strings are unescaped in place before they're presented (modifies your input buffer). No unicode normalization is performed
- Robust to crashes with untrusted input
- SIMD optimizations for string scanning and validation

View File

@@ -214,7 +214,7 @@ inline PRESERVE_NONE WeaselJsonStatus scan_string_impl(Parser3 *self,
}
auto v = V{(int8_t *)buf};
int normal =
(v != V::splat('"') & v != V::splat('\\') & v >= V::splat(0x20))
((v != V::splat('"')) & (v != V::splat('\\')) & (v >= V::splat(0x20)))
.count_leading_nonzero_lanes();
buf += normal;
if (normal < V::lanes) {

View File

@@ -558,7 +558,7 @@ template <std::integral T, int kLanes> struct simd<T, kLanes, Simd_x86_SSE> {
for (; i + 16 / sizeof(T) <= kLanes; i += 16 / sizeof(T)) {
__m128i v0;
memcpy(&v0, &x[i], 16);
v0 = _mm_xor_si128(v0, _mm_set1_epi8(0xff));
v0 = _mm_xor_si128(v0, _mm_set1_epi8((char)0xff));
memcpy(&result.x[i], &v0, 16);
}
for (; i < kLanes; ++i) {
@@ -1702,13 +1702,13 @@ template <std::integral T, int kLanes> struct simd<T, kLanes, Simd_x86_AVX2> {
for (; i + 32 / sizeof(T) <= kLanes; i += 32 / sizeof(T)) {
__m256i v0;
memcpy(&v0, &x[i], 32);
v0 = _mm256_xor_si256(v0, _mm256_set1_epi8(0xff));
v0 = _mm256_xor_si256(v0, _mm256_set1_epi8((char)0xff));
memcpy(&result.x[i], &v0, 32);
}
for (; i + 16 / sizeof(T) <= kLanes; i += 16 / sizeof(T)) {
__m128i v0;
memcpy(&v0, &x[i], 16);
v0 = _mm_xor_si128(v0, _mm_set1_epi8(0xff));
v0 = _mm_xor_si128(v0, _mm_set1_epi8((char)0xff));
memcpy(&result.x[i], &v0, 16);
}
for (; i < kLanes; ++i) {

View File

@@ -1,5 +1,7 @@
_GLOBAL_OFFSET_TABLE_
__cpu_indicator_init
__cpu_model
__stack_chk_fail@GLIBC_2.4
free@GLIBC_2.2.5
malloc@GLIBC_2.2.5
memmove@GLIBC_2.2.5