(Champion needed) Improve handling of network data

Guthib-of-Dan · March 31, 2026, 1:52pm

Hello. All the data someone sends via HTTP / WebSocket requests to us is untrusted. On the frontend or the backend we receive data as ArraBuffer(s), decode into JS string, likely parse with JSON.parse and handle it. However:

JSON.parse can throw SyntaxError (and from network data that’s highly possible). As for me, any Error instance that generates the calling “stack” must be used for debugging. How do we debug something, that isn’t even ours?
We have to decode ArrayBuffer to JS string each request. For Latin characters the result is a produced copy of the data as a JS string, but if decoder finds a single “emoji” (in WEB this is a routine) - everything that was decoded until now gets copied again but as utf16 JS string. One char forces regeneration of the whole payload (it can be large in the end).
Only JSON.parse can validate the payload, so if it WAS invalid - this JS string was a complete waste of CPU time and stays in memory for a while due to GC.
Not only that, this intermediateString is used for microseconds and then goes to GC “eventually”, not immediately. Remember “optimising for use-cases”? We should remove this string completely.

To address these issues I have created these 2 TC39 proposals: JSON.parseBinary and ArrayBuffer.prototype.detach. Note that “.detach” is intended to be the same as “ArrayBuffer.prototype.transfer”, but do less work - don’t perform any copying to another ArrayBuffer

Guthib-of-Dan · March 31, 2026, 2:08pm

JSON.parseBinary has some similarities with Schema-based JSON parsing? - Ideas - TC39 (accept UTF-8 ArrayBuffer/Uint8Array) and
JSON.safeParse()：call JSON.parse() in try/catch? - Ideas - TC39(but rather avoiding throwing errors completely)

Gokul-Gireesh · April 3, 2026, 8:07pm

Hey, I like the idea. It tries to solve a crucial problem, but if you're doing live parsing on the buffer, i think some TOCTOU (Time-of-Check to Time-of-Use) issues might happen. Since the syntax shows it's a sync API, that might not matter much for regular buffers.

If you are just copying and parsing, okay great, but it's not the best for performance(cause u are making a copy). Or maybe you're doing an implicit transfer? I guessed that since you have a co-proposal based on detach/transfer. But that would be a dev inconvenience—you don't want an API suddenly 'eating' up your buffer without warning. May be you suggested manual detaching, but by then, ram would have peaked at double the memory usage. Moreover, like you said, it would be making developers do memory management.

SO…, I have a suggestion for adding an explicit transfer list like postMessage. What I'm thinking is:

await JSON.parseBinary(buff, [buff]) (Async version)
JSON.syncParseBinary(buff, [buff]) (Sync version)
The api just receives the copy/source, parse, deletes (deletes the copy/source). The async version can be a great use for browsers, because u don't want an api blocking the UI just for parsing some large binary.

Anyway, you don’t specify what happens with SAB. We can't detach it nor parse it atomically without huge overhead, so a copy is better. I suggest SAB shouldn't be on the transfer list; it should be an error if you try(a real error, cause it's not a network-produced error), just like postMessage.

Anything not in the transfer list (including SAB) is a copy. I think that solves the issues while keeping the speed.

And, devs may need to know what caused an error. Since you return an object saying it was not okay, why not return the buffer itself if it's an error? Like only if it was on the transfer list (transfer list + error)? I hope that can be a great add-on!, Or, may be they don't care about a malformed buffer. Anyway, consider specifying what happens with SAB.

Guthib-of-Dan · April 4, 2026, 11:01am

Thanks for reviewing my idea! However the changes you’d like to add are actually better to be avoided. Let me explain why (and everything from below I will rephrase and put into proposals like “why not this”):

initial idea for JSON.parseBinary suggests that it doesn’t detach buffer, is synchronous and doesn’t return input as a separate property in the object.
In my view, JSON.parseBinary is a specialized utility with a sole purpose - parse binary data and return the result. If it was to detach buffer inside, it would be “framework-alike” because handles many things at once and TIES ARCHITECTURE to specific rules. I want developers to define their workflow, do only what they need and, if they operate on this raw level, they can avoid unnecessary overhead. In the end, if someone wants to detach buffer alongside this call - create your own utility that does several things, wrapper. Proposing wrapper to be global can be easily cancelled, like JSON.safeParse.
transfer list is an interesting addition, considering postMessage as instance, however here that adds even more headache than manual .detach(). Firstly, this list implies transferring ownership immediately, so after function buffer is detached. But if there is a syntax error, we have 2 options: don’t detach buffer and confuse developers even more (transfer list implies detaching) or detach user’s buffer and create one more buffer as “input” property {ok:false, message, input}.
Secondly, it is just more compelling to have one buffer referenced and detached IF SURE about its uselessness afterwards. But this “input” property is creating another view over buffer, which is another GC overhead.

Lastly, if I do return “input” as a property of the object, who is going to detach it? We are back to square one. We can add “reviver” callback in the end, where we can detach the buffer depending on the answer, but why? The code starts to look more like a workaround rather than a clean solution to a problem.

// this example shows that this limits developer’s
// freedom of choice - use one “buffer” however he wants.
// new idea, user has “buffer”
var result = JSON.parseBinary(buffer, [buffer]);
if(!result.ok) {
  console.log(“bad”, result.input.buffer.byteLength)
  result.input.buffer.detach(); // back to square one. 
}

// OR previous idea
var result = JSON.parseBinary(buffer);
if(!result.ok) {
  // reference existing buffer
  console.log("bad", buffer.byteLength);
  buffer.detach(); // we are done with data, can move forward
  return;
}
// clear initial buffer
buffer.detach();
// proceed handling result.value

await JSON.parseBinary(buf) and synchronous JSON.syncParseBinary(buf). First of all, if we parse some large JSON on the frontend, that doesn’t consume more than half a second, so why?
Secondly, this requires either copying buffer or explicitly “await” its execution.
Thirdly, there are 2 ways to achive async execution:
3.1 parse on the main js thread, but chunked. So this needs to save some “parser state” (I explain why this is cumbersome below) and do so in certain chunk size. Who decides this size? Function argument? Why not just parse buffer at once, if it is available and not keep it in memory for too long? Parsing half of that buffer doesn’t mean that we can already detach it. Memory will spike 2x+, but parsing one whole buffer in chunks make those 1.7x+ pile up tremendously impacting memory usage. And that is not solvable via GC or detach, because again, buffers are referenced and used.
After analyzing await JSON.parseBinary() and figuring out its use of “external state” to save progress, I instinctively considered another proposal like

// return object { ok(false if error), done(if error - false), message(if error), value(if done) }
var parseBinaryChunk = JSON.binaryParser();
/// somewhere in handler where chunks come
var result = parseBinaryChunk(buffer);
// at first it seems that we can detach buffer, but actually no
buffer.detach();
if(!result.ok) {
  console.log(result.message);
  return;
} else if(result.done) {
  var object = result.value;
}

However this parser has problems MAINLY due to that external state. Let’s take a look at example, where chunks come in an awkward but possible way:
chunk 1: { “some string”: “its contents, that are not full
chunk 2: ; this is still that text , not full
chunk 3: ; finally end” }
Given chunks demonstrate, that if we want our memory to avoid 2x+ and as max stay at 1.3x+, then properties have to end within those chunks. Otherwise, we either have to prohibit detaching, or copy those strings into our state. Chunk 1 has to save key and partial value into state, chunk 2 has to copy all contents and concatenate value from chunk 1 with chunk 2 (or save as array), while reparsing those string from utf8 to utf16 because of emoji. In chunk 3 this journey ends, but what was before is enough to drop the idea. Parsing JSON in javascript in the streaming manner without sacificing memory is impossible
—
JSON.parseBinary as a synchronous function, that doesn’t detach buffer internally is solving the problem and doesn’t become “framework-alike”

Gokul-Gireesh · April 4, 2026, 12:48pm

fair enough, it was a lot, staying low is better i think.

conartist6 · April 5, 2026, 3:16pm

I built BABLR and I think it solves the problems you have... It's a stream parser framework:

it doesn't require the creation of a string before parsing binary data
it can parse data split over many chunks without sacrificing memory

https://www.npmjs.com/package/bablr

Happy to answer any questions!

Guthib-of-Dan · April 5, 2026, 6:35pm

Hello! I would really appreciate if you sketched out some example of usage, so that I could get acquainted with this technology of yours in a minute. GitHub - bablr-lang/language-en-json: A BABLR language for JSON · GitHub This repo is the closest I could approach to "bablr" with JSON, however it doesn't provide us with any direct examples. TypeScript declaration files or source files in TypeScript would also come in handy, but there is none.
So please, write an example of parsing JSON in a streamed manner, that can replace JSON.parse in some way and, if you ever think of moving to TypeScript with whole bunch of testing, you might want to take a look at this repo as a baseline (just self-advertisement haha) - GitHub - Guthib-of-Dan/lib_arch: Automated architecture for building typescript libraries · GitHub

conartist6 · April 5, 2026, 8:57pm

I'd start with the CLI.

npx -p bablr -p @bablr/language-en-json bablr -l @bablr/language-en-json << EOF
{
  "key": 3
}
EOF

If you want to do it with the API it's

import { streamParse } from 'bablr';
import language from '@bablr/language-en-json';

let input = '{ "key": 3 }';
let result = streamParse(language, language.defaultMatcher, input);

Guthib-of-Dan · April 6, 2026, 5:37am

Bablr is a unique technology, tree-sitter inspired, however it doesn't provide the solution to those problems, described above in this specific conversation.
First of all, streamParse iterates over data and gives a lot of information on each token. The streaming I implied was the ability to accept incomplete data chunk by chunk, without excessive copying + with the ability to clear those chunks immediately after being consumed.
Secondly, streamParse doesn't accept raw buffers, but only parts of it, that are symbolic and likely are referred to with an array notation (these are just guesses, but it did read 'open bracket'). If it does, provide more information on it, that really addresses the points I have made before.

import { streamParse } from 'bablr';
import language from '@bablr/language-en-json';
var encoder = new TextEncoder();
let input = encoder.encode('{ "key": 3 }');
let result = streamParse(language, language.defaultMatcher, input);
for (var el of result) {
  console.log(el)
}
// output 
{
  type: Symbol(OpenNodeTag),
  value: {
    flags: { token: false, hasGap: false },
    name: null,
    type: Symbol(_),
    literalValue: null,
    attributes: {},
    selfClosing: false
  }
}
file:///C:/Users/abc/projects/dir/node_modules/@bablr/regex-vm/lib/internal/literals.js:4
export const code = (str) => str.codePointAt(0);
                                 ^

TypeError: str.codePointAt is not a function

Lastly, all this parsing comes at cost of the CPU time due to all that info on tokens. This instrument has other use-cases, currently not overlapping with JSON.parseBinary

conartist6 · April 6, 2026, 11:41am

Work with me!

import { streamParse } from 'bablr';
import language from '@bablr/language-en-json';
import { readFile, decodeUTF8 } from '@bablr/fs';

let input = decodeUTF8(readFile('./fixture'));

// now you're parsing a file read in chunks
streamParse(language, language.defaultMatcher, input);

There is not excessive copying and the chunks are being cleared as soon as they are consumed. The complete fixture file is never held in memory.

conartist6 · April 6, 2026, 11:43am

Also what you're calling el is normally called a tag and the best way to look at tags is to print them with printTag:

import { printTag } from '@bablr/agast-helpers/print';

for (var tag of result) {
  console.log(printTag(tag))
}
// output 
'<_>'

Guthib-of-Dan · April 6, 2026, 1:44pm

I am looking at structured examples

Everything I looked at was based on your examples, @conartist6. And this one doesn't work

import { streamParse } from 'bablr';
import language from '@bablr/language-en-json';
import { printTag } from '@bablr/agast-helpers/print';
import { readFile, decodeUTF8 } from '@bablr/fs';

let input = decodeUTF8(readFile('./package.json'));

var result = streamParse(language, language.defaultMatcher, input);

for (var tag of result) {
  console.log(printTag(tag))
}

output

<_>
file:///C:/Users/abc/projects/abc/node_modules/@bablr/fs/lib/index.js:120
    if (result.value !== undefined) yield* result.value;
               ^

TypeError: Cannot read properties of undefined (reading 'value')
    at __readFile (file:///C:/Users/abc/projects/abc/node_modules/@bablr/fs/lib/index.js:120:16)

Performance

In the end, describing bablr in this current conversation might not be the best place. If you want to prove something - give working example, benchmark it for Garbage Collection and time per X iterations. Provide ready results for us to be convinced. Otherwise, this is going to get us nowhere

import { streamParse } from 'bablr';
import language from '@bablr/language-en-json';

let input = '{"a":"123"}'
console.time("streamParse")
for (var i = 0; i < 1_000_000; i++) 
    streamParse(language, language.defaultMatcher, input);
console.timeEnd("streamParse")

console.time("native parse")
for (var i = 0; i < 1_000_000; i++) 
    JSON.parse(input)
console.timeEnd("native parse")

output

streamParse: 1.199s
native parse: 183.336ms

conartist6 · April 6, 2026, 2:17pm

Sorry for the broken stuff. I had to publish a new version of @bablr/fs to fix it, and the example code needed tweaks too. But here at least is runnable example code:

import { streamParse } from "bablr";
import language from "@bablr/language-en-json";
import { printTag } from "@bablr/agast-helpers/print";
import { readFile, decodeUTF8 } from "@bablr/fs";
import { getStreamIterator } from "@bablr/agast-helpers/stream";

let input = decodeUTF8(readFile("./fixture"));

var result = streamParse(language, language.defaultMatcher, input);

let iter = getStreamIterator(result);
let step = iter.next();

await (async function () {
  for (;;) {
    if (step instanceof Promise) {
      step = await step;
    }
    if (step.done) break;
    let tag = step.value;

    console.log(printTag(tag));
    step = iter.next();
  }
})();

At the moment, yes, it's pretty slow. That's partly because of all the polyfilling we have to do though. With support from the language core this could be a lot faster...

conartist6 · April 6, 2026, 2:22pm

Even if you don't decide to use BABLR as your parser, you could still use @bablr/fs and hand-write a parser to process an input expressed as a stream iterable. Then you'd have everything you want I believe...?

markm · April 13, 2026, 12:04am

You could avoid all the complexity issues of TOCTOU, detaching, transfer, etc by limiting the JSON feature to GitHub - tc39/proposal-immutable-arraybuffer: A TC39 proposal for immutable ArrayBuffers · GitHub or Uint8Arrays on them.

Further, if engines supported a string-decode of an Immutable ArrayBuffer as a view onto the Immutable ArrayBuffer, the performance gain you seek could even happen without any new APIs or any API changes. OTOH, engines are unlikely to provide that string-as-view optimization.

Topic		Replies	Views
(Champion needed) Improve handling of network data (2) 💡 Ideas proposal	42	161	April 26, 2026
Schema-based JSON parsing? 💡 Ideas	9	682	April 4, 2022
JSSugar: a concrete implementation with BABLR 💡 Ideas	16	161	February 2, 2026
[Issue] BigInt serialization 💡 Ideas proposal	23	573	July 17, 2025
Take 2: generator.prototype[Symbol.mixedIterator] 💡 Ideas	45	881	April 19, 2024

(Champion needed) Improve handling of network data

I am looking at structured examples

Performance

Related topics