Files
weaseljson/README.md
2025-08-25 15:43:37 -04:00

4.6 KiB

WeaselJSON: A Streaming JSON Parser Review

What is WeaselJSON?

WeaselJSON is a high-performance, streaming JSON parser that uses callbacks instead of building an object tree in memory. It's optimized with SIMD instructions and designed for situations where you either can't or don't want to load entire JSON files into memory.

Key Characteristics

What's good:

  • Uses constant memory regardless of input size
  • Fast parsing with SIMD optimizations
  • Follows JSON spec properly with good security practices
  • You can't accidentally make it O(n²) (unlike the simdjson ondemand api)
  • Enables weird use cases like parsing just the beginning of huge JSON files

What's not good:

  • The callback API is a pain to use correctly
  • Requires a lot of boilerplate for simple tasks
  • Most people don't actually need streaming JSON parsing
  • Some features only work if you control how the JSON is structured

JSON Parser Decision Guide

flowchart TD
    A[Need to parse JSON?] --> B{Do you have memory constraints<br/>or truly streaming data?}

    B -->|No| C{Is performance critical<br/>and data fits in memory?}
    B -->|Yes| D{Can you control<br/>how the JSON is laid out?}

    C -->|No| E[SimdJSON DOM API<br/>Fast and easy to use<br/>Wait why are you using C++?]
    C -->|Yes| F{Are you OK with potential<br/>performance traps?}

    F -->|Yes| G[SimdJSON On-Demand<br/>Very fast<br/>Nice API<br/>Easy to use wrong]
    F -->|No| H[Consider weaseljson<br/>if you can deal with callbacks]

    D -->|No| I{Can you preprocess data<br/>or make multiple requests?}
    D -->|Yes| J[WeaselJSON<br/>Streaming performance<br/>Constant memory usage<br/>Harder to use]

    I -->|No| K[Use SimdJSON DOM<br/>Deal with the tradeoffs]
    I -->|Yes| L[WeaselJSON with<br/>data preprocessing]

When to Use What

Use SimdJSON DOM API when:

  • You want to write obj["key"] and have it just work
  • JSON files are reasonably sized (but you still want decent performance)
  • You care about getting things done

Use SimdJSON On-Demand when:

  • Performance is critical and your data fits in memory
  • You understand that some access patterns can accidentally become very slow
  • You need maximum speed but still want a usable API

Use WeaselJSON when:

  • You absolutely cannot load the whole JSON into memory
  • You're processing huge JSON files (multiple gigabytes)
  • You want to parse just part of a JSON file without reading the rest
  • You need to convert JSON into some other format as you parse it
  • You can't risk accidentally slow performance
  • Writing stateful callbacks is your idea of fun

The Reality

WeaselJSON solves real problems that other parsers can't handle, but most people don't have those problems. The callback API is legitimately difficult to use correctly, which pushes most developers toward easier alternatives.

The parser represents excellent engineering work and occupies a useful niche. It's the kind of tool you're very glad exists when you actually need it, but most people will never need it.

Note: WeaselJSON assumes modern CPU features (SIMD), so you probably can't use it for embedded development.

Technical Reference

Features

  • SAX-style callback API
  • No memory allocations during parsing
  • O(1) memory usage regardless of input size
  • Streaming API - no need to buffer the entire document in memory. Parsing is resumed when more data is available
  • Strings are unescaped in place before they're presented. No unicode normalization is performed
  • Robust to crashes with untrusted input
  • SIMD optimizations for string scanning and validation

RFC 8259 Conformance

  • There are no limits on number precision. Numbers are only validated syntactically and are presented as is
  • Only UTF-8 is accepted
  • Invalid UTF-8 is rejected
  • Byte order markers are rejected
  • Invalid escaped UTF-16 surrogate pairs are rejected
  • Documents that are too deeply nested are rejected to control memory usage
  • Duplicate keys are presented

Caveats

  • Users should be prepared to discard work done during SAX callbacks if the document is ultimately rejected
  • Requires manual state management in callback functions
  • API is more complex than DOM-style parsers

Maintainer Notice

⚠️ Important: This is a hobby project by a single maintainer. Please be aware that:

  • I'm doing this for fun and learning, not as a professional obligation
  • Don't rely on this for mission-critical applications without understanding these limitations
  • Feel free to email me, but I may not respond promptly

Note: This document was written by AI (Claude) in collaboration with the author.