Compare commits

..

10 Commits

Author SHA1 Message Date
995ddf329f Fixes for gcc 15 2025-11-06 12:49:38 -05:00
5fc823b392 This didn't belong here and isn't really necessary 2025-08-25 16:50:43 -04:00
490b49e55d Remove redundancy 2025-08-25 16:49:21 -04:00
6fc263074e Simplify flowchart 2025-08-25 16:46:46 -04:00
befb464619 More README tinkers 2025-08-25 16:39:01 -04:00
f2bb72b3dc More clarifications in README 2025-08-25 16:19:34 -04:00
81814fa590 Simplify flow chart 2025-08-25 16:00:58 -04:00
611bccfb9b Update README 2025-08-25 15:57:07 -04:00
e8d2855b36 Add maintainer notice 2025-08-25 15:43:37 -04:00
1f540a436a Tailor readme to prospective users 2025-08-25 15:34:29 -04:00
4 changed files with 92 additions and 18 deletions

100
README.md
View File

@@ -1,26 +1,98 @@
# Weaseljson
# WeaselJSON: A Streaming JSON Parser
An rfc8259-compliant streaming json parser
## What is WeaselJSON?
# Features
WeaselJSON is a high-performance, streaming JSON parser that uses callbacks instead of building an object tree in memory. It's optimized with SIMD instructions and designed for situations where you either can't or don't want to load entire JSON files into memory.
- SAX-style api
## Key Characteristics
**What's good:**
- Uses constant memory regardless of input size
- Fast parsing with SIMD optimizations
- Follows JSON spec properly with good security practices
- You can't accidentally make it O(n²) (unlike the simdjson ondemand api)
- Enables weird use cases like parsing just the beginning of huge JSON files
**What's not good:**
- The callback API is a pain to use correctly
- Requires a lot of boilerplate for simple tasks
- Most people don't actually need streaming JSON parsing
## JSON Parser Decision Guide
```mermaid
flowchart TD
A[Need to parse JSON?] --> B{Memory or size constraints?}
B -->|Yes| C[WeaselJSON<br/>Streaming parser]
B -->|No| D{Need maximum speed?}
D -->|No| E[SimdJSON DOM<br/>Easy to use]
D -->|Yes| F[SimdJSON On-Demand<br/>Fastest option]
```
## When to Use What
### Use **SimdJSON DOM API** when:
- You want to write `obj["key"]` and have it just work
- JSON files are reasonably sized (but you still want decent performance)
- You care about getting things done
- You need full document validation upfront
### Use **SimdJSON On-Demand** when:
- Performance is critical and your data fits in memory
- You can work with forward-only traversal (no random access or backtracking)
- You need maximum speed but still want a usable API
- You're okay with partial validation (only validates parts you actually access)
- Your JSON keys don't contain escape sequences (OnDemand matches raw keys without unescaping)
### Use **WeaselJSON** when:
- You want to parse just part of a JSON file without reading the rest
- You need to convert JSON into some other format as you parse it
- You need predictable performance characteristics
- Writing stateful callbacks is your idea of fun
## The Reality
WeaselJSON solves real problems that other parsers can't handle, but most people don't have those problems. The callback API is legitimately difficult to use correctly, which pushes most developers toward easier alternatives.
The parser represents excellent engineering work and occupies a useful niche. It's the kind of tool you're very glad exists when you actually need it, but most people will never need it.
## Technical Reference
### Features
- SAX-style callback API
- No memory allocations during parsing
- O(1) memory usage
- Streaming api - no need to buffer the entire document in memory. Parsing is resumed when more data is available
- Strings are unescaped in place before they're presented. No unicode normalization is performed
- O(1) memory usage regardless of input size
- Streaming API - no need to buffer the entire document in memory. Parsing is resumed when more data is available
- By default, strings are unescaped in place before they're presented (modifies your input buffer). No unicode normalization is performed
- Robust to crashes with untrusted input
- SIMD optimizations for string scanning and validation
# Rfc8259 conformance notes
### RFC 8259 Conformance
- There are no limits on number precision. Numbers are only validated syntactically and are presented as is
- Only utf-8 is accepted
- Invalid utf-8 is rejected
- Only UTF-8 is accepted
- Invalid UTF-8 is rejected
- Byte order markers are rejected
- Invalid escaped utf16 surrogate pairs are rejected
- Invalid escaped UTF-16 surrogate pairs are rejected
- Documents that are too deeply nested are rejected to control memory usage
- Duplicate keys are presented
# Caveats
### Caveats
- Users should be prepared to discard work done during SAX callbacks if the document is ultimately rejected
- Requires manual state management in callback functions
- API is more complex than DOM-style parsers
## Maintainer Notice
⚠️ **Important**: This is a hobby project by a single maintainer. Please be aware that:
- I'm doing this for fun and learning, not as a professional obligation
- Don't rely on this for mission-critical applications without understanding these limitations
- Feel free to email me, but I may not respond promptly
---
*Note: This document was written by AI (Claude) in collaboration with the author.*

View File

@@ -214,7 +214,7 @@ inline PRESERVE_NONE WeaselJsonStatus scan_string_impl(Parser3 *self,
}
auto v = V{(int8_t *)buf};
int normal =
(v != V::splat('"') & v != V::splat('\\') & v >= V::splat(0x20))
((v != V::splat('"')) & (v != V::splat('\\')) & (v >= V::splat(0x20)))
.count_leading_nonzero_lanes();
buf += normal;
if (normal < V::lanes) {

View File

@@ -558,7 +558,7 @@ template <std::integral T, int kLanes> struct simd<T, kLanes, Simd_x86_SSE> {
for (; i + 16 / sizeof(T) <= kLanes; i += 16 / sizeof(T)) {
__m128i v0;
memcpy(&v0, &x[i], 16);
v0 = _mm_xor_si128(v0, _mm_set1_epi8(0xff));
v0 = _mm_xor_si128(v0, _mm_set1_epi8((char)0xff));
memcpy(&result.x[i], &v0, 16);
}
for (; i < kLanes; ++i) {
@@ -1702,13 +1702,13 @@ template <std::integral T, int kLanes> struct simd<T, kLanes, Simd_x86_AVX2> {
for (; i + 32 / sizeof(T) <= kLanes; i += 32 / sizeof(T)) {
__m256i v0;
memcpy(&v0, &x[i], 32);
v0 = _mm256_xor_si256(v0, _mm256_set1_epi8(0xff));
v0 = _mm256_xor_si256(v0, _mm256_set1_epi8((char)0xff));
memcpy(&result.x[i], &v0, 32);
}
for (; i + 16 / sizeof(T) <= kLanes; i += 16 / sizeof(T)) {
__m128i v0;
memcpy(&v0, &x[i], 16);
v0 = _mm_xor_si128(v0, _mm_set1_epi8(0xff));
v0 = _mm_xor_si128(v0, _mm_set1_epi8((char)0xff));
memcpy(&result.x[i], &v0, 16);
}
for (; i < kLanes; ++i) {

View File

@@ -1,5 +1,7 @@
_GLOBAL_OFFSET_TABLE_
__cpu_indicator_init
__cpu_model
__stack_chk_fail@GLIBC_2.4
free@GLIBC_2.2.5
malloc@GLIBC_2.2.5
memmove@GLIBC_2.2.5