JSON.parse
and JSON.stringify
are eager (and blazingly fast), but doesn't scale beyond a few megabytes on servers and a few hundred megabytes on clients. It's also not especially useful for resource-constrained IoT in general for similar reasons.
I would like to see a streaming variant similar in spirit to Jackson for JVM apps. Here's a rough sketch as a starting point:
-
enumerator = JSON.tokenize(iterable)
- Tokenize an iterable/async iterable of strings and/or array buffers into a token enumerator*- Newline-delimited JSON is accepted by default, allowing it to restart and be usable in such an environment. If you only want a single value, just read that one value and immediately call
enumerator.close()
, and it'll just work. - No reviver is supported as it's...kinda useless in this context. (That data path is already exposed, so it doesn't matter.)
-
enumerator.nextToken(typeMask = JSON.ALLOW_ALL, maxScale = BigInt(Number.MAX_SAFE_VALUE))
gets the next token matching a type bit mask*, returningJSON.INVALID
on mismatch-
JSON.ALLOW_NULL
- Acceptnull
-
JSON.ALLOW_TRUE
- Accepttrue
-
JSON.ALLOW_FALSE
- Acceptfalse
-
JSON.ALLOW_BOOLEAN
- Sugar forALLOW_TRUE | ALLOW_FALSE
-
JSON.ALLOW_STRING
- Allow strings -
JSON.ALLOW_NUMBER
- Allow number -
JSON.ALLOW_INTEGER
- Allow integer and distinguish integers from floats by returning integers as bigints -
JSON.ALLOW_OBJECT_START
- Allow that relevant token -
JSON.ALLOW_OBJECT_END
- Allow that relevant token -
JSON.ALLOW_OBJECT_SEEK
- Allow object end and ignore entries leading up to it -
JSON.ALLOW_ARRAY_START
- Allow that relevant token -
JSON.ALLOW_ARRAY_END
- Allow that relevant token -
JSON.ALLOW_ARRAY_SEEK
- Allow array end and ignore entries leading up to it -
JSON.ALLOW_ALL
- Sugar for all the above OR'd together exceptALLOW_INTEGER
- A bit mask is used to reduce GC overhead, as this is a very perf-sensitive operation.
- Max scale (applies to bigints and strings) can be passed to optimize for things like dates, Ethereum hashes, and compiled schema enum variants, to reject obviously invalid data early.
-
-
enumerator.ready()
returns a promise that resolves once it either has data to parse or completes
- Newline-delimited JSON is accepted by default, allowing it to restart and be usable in such an environment. If you only want a single value, just read that one value and immediately call
-
{writer, output: iterable} = JSON.generate(opts?)
- Generate an async iterator of strings or array buffers (by option)- This generates newline-delimited JSON.
writer.close()
can be used to terminate such a stream, and a single value can be written followed bywriter.close()
to ignore subsequent values. -
writer.ready()
returns a promise that resolves once it is ready to output more -
writer.closed
returnstrue
if either a full value has been written or ifiterable.return()
has been called. -
writer.write(token, treatNumberAsNonInteger = false)
- Write a token-
treatNumberAsNonInteger
, iftrue
, ensures that all numbers are suffixed with a.0
if they're integer numbers. - Returns
false
if the iterable isn't ready for more.
-
- This generates newline-delimited JSON.
- Async iterators here can have their
next
method invoked with an optional chunk size in bytes, to help resolve backpressure. - Replacers and revivers are not supported as I don't want to require state beyond the bare minimum necessary to sustain this - it's supposed to be lightweight.**
- Tokens are immediate values (if neither objects nor arrays), or the following symbols:
-
JSON.INVALID
- can only be returned fromenumerator.nextToken
and is not valid as an argument forwriter.write
orenumerator.maybeConsume
JSON.CLOSED
JSON.UNAVAILABLE
JSON.OBJECT_START
JSON.OBJECT_END
JSON.ARRAY_START
JSON.ARRAY_END
-
- Note: objects are simply alternating string + value pairs, to simplify the API. Users are expected to handle these appropriately, even though there's defined behavior if they don't.
* This explicitly eschews the existing reviver/replacer idioms in favor of manual handling of values to avoid the overhead, and it also uses bit masks for the type specification to avoid garbage collection overhead. This is because the code lies on a critical performance path, and necessarily needs to be decently fast to compete with JSON.stringify
at all. (It's a fairly data-driven API.)
** The stack could be implemented as a bit stack, where 0 = array and 1 = object value - an engine could optimize for the common case by simply using a 31-bit integer initially and only upgrading to a heap bit array later. Everything else could be tracked via simple static states, resulting in near zero memory overhead aside from buffered values. Replacers require tracking cycles, and I want to avoid the overhead of that in this API.
*** This obviously pairs well with the yield.sent
proposal, but neither depends on the other.