JSONRegistry - Allows branded instances in regular JSON

I've written so many serializer/encoder/decoder but I feel like going back to basics every single time, although this time I think I've nailed a prototype that works well with both native JSON API and flatted, downloaded 500M+ per month these days (signaling JSON is still the most used serializer out there, despite its age and caveats).

JSONRegistry is identical to JSON except it allows users to define a registry payload:

const { parse, stringify } = new JSONRegistry([
  ['Uint8Array', {
    is: value => value instanceof Uint8Array,
    to: value => Array.from(value),
    from: value => Uint8Array.from(value),
  }],
]);

parse(stringify(new Uint8Array([1, 2, 3])));
// it's a Uint8Array reference

How does it work?

It basically uses a well known type such as array to push the desired transformed type, where types must be unique strings (it throws on duplicated types like any registry). The reason it does that is to both KISS and have the ability to revive types at the other end, something toJSON is not capable of, something every other serializer does in a way or another as both binary format or string to unserialize.

The bonus types can be configured (that is bigint or symbol as value) if desired, these are the only primitive that would not work out of the box but, if not registered, these throw like these would within regular JSON.

Why do we need it?

There are various similar topics where developers want "this type or that one" embedded natively (it does not scale well) or special Symbol.toJSON which cannot have a Symbol.fromJSON counterpart because incompatible with JSON standard itself, other ideas that try to workaround structuredClone or JSON limitations but there's literally nothing concrete to consider/move forward, imho.

This idea solves:

  • it puts exotic types on developers shoulder but it uses regular JSON API transparently
  • it's compatible out of the box with JSON, nothing changes at its standard level
  • it helps projects share a well known/defined payload
  • it throws as soon as things are not registered or not sound, just like symbol or bigint would already
  • it allows engines to optimize at their core for performance boost with a logic that is very straight forward

I hope the prototype and this thread could at least start some sort of conversation so thanks in advance to whoever is interested into having that conversation :waving_hand:

more in depth details:

stringify(new Uint8Array([1, 2]))

// "[[1,2,0],"Uint8Array"]"

The stringify replacer verify via each registered is(ref) if the registry knows that kind of value, then, if that's the case, returns [to(ref), "Uint8Array"] where Uint8Array is the key-type.

The to(ref) operation returns the array [1, 2] which is then processed, as per JSON specs, by the replacer. A raw array that is not an intance of anything else adds a 0 at its end. Right now it does that by concatenating its values + that 0 but that could be either enforced by contract (as in: must return an ephemeral representation of the result, so we can push a single 0 instead) or be optimized at the runtime level.

If to(ref) returned anything else that is not an array, that'd still be [result, "Type"] and result will be parsed.

In short, no modification of anything ever happens, allowing user-land caching and whatnot.

The parse does the opposite dance: any array is parsed once so that its last entry can be poped (fast/efficient). If that last entry is a 0 then the array itself is returned, otherwise a registry from(ref) is invoked, where the from is dictated by that last key (it can only be 0 or a string/key of the registry). It throws happily ever after it that key is not known to avoid surprises, but JSON is already an easy to throw API, yet that mitigate any possible "evil attack" around this idea.

In short:

  • only registered properties survive parsing, everything unexpected throws (like in JSON)
  • there is a thin cache that transforms cyclic references once and never again. Both to and from should never happen twice for the exact same instance (the from works like that if you use a circular-references capable library such as flatted)
  • all edge cases are covered: the only type that results always as itself plus last item to pop is array, everything else is untouched and plain JSON

From my side, the TBD is around the registry itself, it feels like it'd be more natural to have it as an instanceof Map (a Map extend) that throws on duplicated set and has no delete possible (?) but beside that, tests and the fact the logic is literally a few LOCs, nothing breaks at the native JSON.parse level, anyone can hook its favorite PL at the parsing or stringifying end of affairs surviving pure JSON roundtrips hints me there's value in this implementation that I haven't seen explored before.

Happy to answer any other question, if any arises from here.

P.S. the last big gotcha around this proposal is the inability for the replacer to retrieve instances before their toJSON() kicks in ... the most notably scenario here is Date object. If there is interest in this proposal I think Date instances should also internally be replaced as ['iso-string','Date'] behind the scene because it's user-land code otherwise to use dates proxies that return undefined at toJSON access but hey, AFAIK that's also the only eventual caveat around JS primitives I could think about, as I don't think there are other primitives with such toJSON() assumption out there, at least not embedded in the languague, those that used weird toJSON() tricks without any native possibility to be revived will, hopefully, adopt and use this new API standard/proposal.

Ideally though, and it could be polyfilled too, references that have a toJSON() should also pass through the replacer via stringify so that this new API could be a great migration helper for outdated to modern APIs.

edit

P.S.2 I've already tried to use a more "natural" {type: value} to brand check but:

  • objects are way more common than arrays, think about any database returned row
  • transforming a shape into a {type:value} pair requires a lot more logic to both stringify and parse (the this context of the owner should be checked per each entry of any regular object, performance hostile, imho)
  • it's less human/AI readable because type: "AnyString" easily confuses "eyes"
  • it was way slower and more convoluted for no real benefit except "elegance" and "purity" of the original specs, where arrays are not meant to be polluted as identity with anything else ...

Hear me out I wouldn't oppose to a {type:value} solution, heck I had that code already running initially, the performance hit though, and the way higher verbosity, concerned me to a point I've just thought: "ok, I think arrays are better candidate" ... a pop in an array is surely faster than an object iteration to check if type is the only key and present in the registry, with other shenanigans that came up while developing that proposal ... yet probably, if there is interest in making this a native proposal where any regular object is wrapped as {"":ref} I would be OK with that, just please give us something because the amount of JSON invariant based libraries, in 2026, is becoming quite ridiculous to me, including the fact I am still trying to find a solution that can work even cross Programming Language.

My two cents is that you should just extend the JSON language and write a parser for the extended language. That's basically what you're talking about doing, except that you've avoided any change in syntax.

I don't think it matters much that the syntax looks superficially unchanged though, because what you're proposing is a different language than JSON: it takes documents which already have a defined meaning and gives them a different meaning.

So yeah, I would suggest that language extension is the way to go, because either way you're creating a language incompatible with JSON. At least if it has different syntax someone consuming it can get a parse error if they don't support the format. The primary thing you would change by not extending the syntax is by ensuring that no parse error could encur and that real miscommunication and data corruption would ensue.

You're proposing to add a class syntax to JSON, and your proposal for the syntax is

{ "inst": ["type", { "property": "value" }] } 

That syntax for class instances conflicts with existing syntax that already has well-defined meaning though, so why not just introduce a syntax that doesn't?

{ "inst": Type { "property": "value" } }

currently it's [ref, type] as already explained and [...ref, 0] for arrays, but you made a valid point that is hostile with parsers that are not aware of that final array type (although, the whole point is that both creator and consumer should use the new API).

At that point there are two better options:

  • I wrap only registered types as `{[type]: valueFromRegistry}`
  • I add some special key to the ref but that's hostile for Uint8Array case so ... probably a bad idea

it does in the sense that the whole process survives roundtrips, either databases or whatnot, without changing the program logic.

not quite, just a different API that knows how to deal with the specialized outcome.

I haven't mentioned flatted by accident, it's one of the top used projects on npm and sure enough I had lot of people asking for "does it support Map?" because it's an absurdity we can't serialize anything in JS while PHP or Python or others all have way better and features rich capabilities.

The structuredClone doesn't want to provide an intermediate state, I've proposed it: failed.

There are hundreds of JSON variants out there, CBOR, MessagePack, my own binary compatible formats that can do more, extensible things but nothing backed into the standard / core as API, which is what I am after.

People don't care much about specialized stringified result, if they know they need that API parser to retrieve the right thing, flatted (once again) is one case, my own structured-clone/json variant is another (that's alwo widely used/adopted). If I need to create a language extension nobody wins, so I'd rather discuss what is it that is really problematic beside the fact if I encode with gzip then brotli will fail at decoding, in here that's not even the case because stringified data will work just as fine with any JSON compatible language and a reviver to "skip" not interesting parts can be provided with ease, as long as we can move forward with anything that allows developers to define custom types transformation beyond the poor toJSON() ability that would require anyway a specialized reviver to have a meaning and to date nobody cared that escape hatch does not provide any counter-reviver hook to have back the value, Date is a perfect example of that.

So, hoping intent and reasons for this discussion are cleaner, is there really nobody interested in refining this possibility? Again, working on making the indirection more explicit and less problematic (no array laast value shenanigans attached) but if there's no interest I might just keep proposing user-land solutions although it'd be a bit of a bummer for everyone desiring better ergonomics around such simple format that JSON is ... it could be way more capable, it throws already with many modern things and it required weird amends for bigints, at least let's discuss this new parser syntax and ship it within the core? That'd be something to me, thanks!

edit

or let's put in this way ... the JSON API asks for replacer and reviver standardization to allow more complex type, let's discuss what could be the best approach to provide such ergonomics in a way that simply hooks natively with what's possible already but can work across projects and other programming languages ... shall we? I don't want to "change the world again" around language extensions, I think TS and others over time demonstrated that's not the best one can offer for all cases ... JS always survived though, let's make it better around something as old as JSON?

If I need to create a language extension nobody wins

OK, but to be really really really really really really really really clear, what you proposed IS a language extension, it's just one that IS IN DIRECT CONFLICT with the existing language. It will not ever be merged so long as this is the case. We all utterly depend on TC39 not merging proposals that would simply completely change the meaning of spec-compliant code that already exists. So the simple answer to what you've written is, in a word, no.

You can't do it for the same reason you can't declare that the new syntax for records is any string that starts with the letter "r" like "r{ foo: true }". I would hope it would be obvious that you cannot possibly have "r{ foo: true }" be a new syntax because it already has an unambiguous meaning and thus shit would break.

To be clear, I'm absolutely with you on developing a proposal for the language to support a serialization mechanism that supports all the stuff JS needs to round trip state like Python can with pickle.

I'm not saying "no proposal like this should happen". I'm only saying "take 1 is not going to be plausible from the perspective of the committee"

I don't know if it changes anything but v0.2.0 has been just published and it's a whole rewrite: Full refactoring out of `object` variant instead of array pollution by WebReflection ยท Pull Request #1 ยท WebReflection/json-registry ยท GitHub

Basically the contract is that now objects wrap themselves once and are revived accordingly. bigint and symbol types can still be optionally added to the mix, the Date concern can be mitigated via:

registry.register('Date', { is: d => d instanceof RegistryDate, to: d => d.toISOString(), from: s => new RegistryDate(s)  })

The class is eventually class RegistryDate extends Date { toJSON = undefined; } to call it a day.

Forgot to mention: I've used pickle to automatically provide Pyodide based on DOM functionality out of Python used to render SSR, it's pretty amazing but it's a bit too much.

What I am after, and like me everyone that implemented one variant of the same concept (CBOR, MessagePack, Protobuff, flatted-view, you name it) is a way to have structuredClone intermediate "buffer" plus a mechanism to signal custom serialization and deserialization procedures, just like I am proposing with this JSONRegistry. The ergonomics are pretty solid, the implementation detail can be fully opaque for human eyes, as long as we define a standard that works with SharedArrayBuffer, can travel transfered faster over Workers, and represent both native types and custom types, it doesn't need to reveal internals like it is for Files that despite their async API behind the scene can still travel and so on.

Nobody needs to bring closures/classes directly from a realm to another, imho, that'd be confusing and it will take forever + it's an easy footgun as soon as stuff travel from Web to Server and/or vice-versa. We just need a modern transport that does what JSON has been doing forever but I don't want to "end my programming days" with JSON being the only native serializer JS offers ... (saying that because I'm 48 and I know a few topics in this field took decades!)

edit

on a second thought ...

completely change the meaning of spec-compliant code that already exists.

that's still not true to me, it's JSON compatible with JSON, it breaks nothing ... if some JSON string changes the receiver needs to know it did, that's true with everything JSON based but ... what if we change the proposal to be a StructuredCloneRegistry instead with similar semantics and everything is hidden behind the implementation details because nobody can read or interfere directly with structuredClone so that all one need is to define a registry and what matches works out of the box, for custom types, what doesn't translate simply travels as payload "as is" ???

Would this proposal have a chance? I can expand how it would work in practice, nothing would break, incremental opt-in update for that specification only ...

Hmm. Ok. I see a bit more of the picture. I'm thinking about it.

1 Like

as much as I love the fact you eventually understood my poorly explained reason for such suggestion at all, to which I apologise about but thanks for reading through, this kind of answer is the one that worries me the most because it breaks momentum and it lands in "5 years passed, others had same needs so something happened" story, and I've been there already so many times I wish this time would be different, thanks.

Well then convince me!

You're introducing a semantically incompatible, non-feature-detectable definition of the JSON language.

To convince me either you could show me there's a massive gain that justifies so much possible breakage, or you could figure out how to get the same outcome without risking so much breakage.

But also as I've said before, I'm nobody at all. Of the people you really need to convince to get something done, I'm not any of them.

I don't need to, my latest approach is nearly identical to what JSON.rawJSON(value) does:

JSON.rawJSON(123n)
// { rawJSON: '123' }

That hints that passing new JSON compatible objects around has history, a precedent and I haven't heard a single person complaining that bigint were now a branded object or could travel (that was the whole point).

If anything, my proposal should also consider rawJSON as invalid key to register, like it is already for empty string, but that would be it.

Actually, my proposal would work similarly for payloads objects but it offers (without the context) an automatic possibility to parse the received data. For bigint we still need to provide a reviver that takes care of that, which is unfortunate, imho, these could be automatically encoded/decoded with my proposal if desired and without being on libraries way (and repeatedly per each parse).

I'm not sure I really understand what this is proposing or its advantages.
Registering properties as something else than a classic JSON value should be optional, it's an extension of normal JSON. If you want to store something else (a Uint8Array, a Date, a BigInt, or anything else) it should either not be a JSON or be clear in its content. If we use the example you gave with

stringify(new Uint8Array([1, 2]))
// "[[1,2,0],"Uint8Array"]"

Then it is quite easy to make a custom reviver/stringifier.

const isCustom = (val) => Array.isArray(val) &&
    val.length === 3 &&
    val[0] === "__CUSTOM_TYPE__" &&
    Array.isArray(val[1]) &&
    typeof val[2] === "string";

const customParse = (_, val) => {
  if (!isCustom(val)) return val;

  if (val[2] === "BigInt") return BigInt(...val[1]);
  if (val[2] === "Symbol") return Symbol(...val[1]);
  const constructor = getConstructor(val[2]);
  // implement the 'getConstructor' function safely or just use 'eval'
  return new constructor(...val[1]);
}

const customStringify = (_, val) => {
  if (["string", "number", "boolean"].includes(typeof val)) return val;
  // your custom logic here
  if (typeof val === "symbol") return ["__CUSTOM_TYPE__", [val.description], "Symbol"];
  if (typeof val instanceof Uint8Array) return ["__CUSTOM_TYPE__", [Array.from(val)], "Uint8Array"];
  if (typeof val instanceof MyClass) return ["__CUSTOM_TYPE__", [val.getArg1(), val.getArg2()], "MyClass"];
  // ...
}

JSON.parse('{"a":["__CUSTOM_TYPE__",[[1,2,0]],"Uint8Array"}', customParser);
// { a: Uint8Array([1, 2, 0]) }
JSON.stringify({ a: new Uint8Array([1, 2, 0]) }, customStringify);
// '{"a":["__CUSTOM_TYPE__",[[1,2,0]],"Uint8Array"]}'

It's quite simple to implement and this way the "encoding" is adaptable to anyone's need.
I like the syntax you're proposing for the JSONRegistry, but other than that I don't really see it.
If you had a way to tell more precisely what the "encoding" is (both for parsing and stringifying), then I think I could see its advantages, but it would still require some boilerplate to implement on the user's side.

That implementation definitely isn't secure. I've been thinking pretty hard about the other implementation offered by @WebReflection, and it seems like it would be secure at least: it transforms every object.

on top of that, it's an ad-hoc reviver which is the whole point I am trying to avoid ... you have a registry both ends so you can serialize on one end and de-serialize on the other without thinking but this would work wonders when it comes to cross programming languages barriers because you can brand your foreign PL through the registry and brand-back on the other side with ease.

A registry is a contract and in here it's meant to do exactly what JSON.rawJSON does but it's not confined to just primiteives, it works with primitives and literal objects or array, basically it works with JSON compatible values, never breaking its syntax/outcome/output.

also worth mentioning I've entirely changed my code/proposal, that would result into {"Uint8Array":[1,2]} but please don't stop at the Uint8Array example, this is about anything you want to transfer, an amend over the ugly toJSON() standard which has no way to be reliably resumed at the other end.

@WebReflection Could you provide some more complete/complex example code here annotated end to end? That example is so small that I think think I know what you're implying, but also only because I have extra context that came from somewhere else.

I don't need to, my latest approach is nearly identical to what JSON.rawJSON(value) does:

Note that JSON.rawJSON() doesn't just produce a {"rawJSON": ...}object; it also brands that object so that JSON.stringify() knows to stringify it specially (as the contents of the value string). If you make that object yourself and try to use it, it's stringified as the object, not the value. (The only reason it's an object at all is because primitives can't hold brands like that, so we need to add a wrapper.)

As far as I can tell, your proposal is meant to address serialization & revival, however, which means you lose any branding opportunity and have to rely on actual text, so the bytes that get output by stringify do show the effects of the registry. Right? That is indeed a pretty different proposal to JSON.rawJSON(), then.