Add a GC option to FinalizationRegistry registry for WASM/foreign use cases

I am trying to help as I can to ship the next Python / JS FFI proposal and it’s clear, not only for Python but for the entirety of PLs running in WASM, there is a need to make some observed/registered reference greedier than others, yet the FinalizationRegistry API does not allow anything like a priority task, which default would be ā€œwhateverā€ (metaphorically speaking) but that could instead be foreign or high-priority or greedy or … you name it (I wouldn’t mind the name resolution, it’s the concept I am after).

Background

WASM targeting runtimes and PLs are based on SharedArrayBuffer which is something that can only grow over time and never shrink … when interoperability or any FFI attempt is present to deal with the outer JS world, there is a topic not many are discussing or are even aware of: concurrent GC pressure.

While on the JS side of affairs the engine can decide when it’s about the time to run any GC related issue because maybe the used RAM is too high, the counter-part PL on the WASM side of affairs is screaming for collection of those references passed along, preserved as identity, and incapable of freeing themselves unless risky proxy.destroy() operations happen, due memory limits constrainta, leaving dangling and dead proxy pointers on the JS side.

This situation has forced projects like Pyodide to be paranoid about it, but pretty much every other PL targeting WASM has the same issue: Lua, Ruby, PHP, even QuickJS, running as JS alter-ego has issues with the fact JS can’t be forced to GC but it can inevitably make these runtimes fail on claiming more RAM, slowing these down when re-allocating memory is needed, and being incapable to keep the amount of needed SharedArrayBuffer reasonable.

Proposal

Unfortunately the [FinalizationRegistry register]( ECMAScriptĀ® 2026 Language Specification ) method was layed out with an unfortunate extra 3rd argument that would be the eventual token to unregister any reference, so here I am asking for an optional 4th argument that optionally dictates the desire to track that more greedly than others, as in the string `short` to indicate the hopeful lifetime of that registered reference, or a new method all together, because the third argument cannot be an object or a reference, as that’s spec’d already as token reference to unregister the first argument.

Honestly, any idea that could help pursue the goal would be more than welcomed, but it’s super clear while PLs on their side can hook into what should be collected or not (in this case CPython) we have nothing like ā€œshould be freed ASAPā€œ on the JS side, still not observable, yet great as a hint for interoperability purposes … the currently proposed *FFI* solution is otherwise cumbersome, awkward, users’ not friendly at all, and so on … thanks!

I remember a proposal about providing hints to the JS GC about how much "linked" systems are under pressure to attempt to trigger a GC.

Any approach would have to be based on hints because nothing around memory management can be prescribed (engines do not guarantee they will collect garbage).

You also might be interested in recent v8 changes to avoid pinning FR targets, which from what I understand caused massive pressure by excluding all these objects from young collection.

I suspect you'll still need to be careful with WR since that causes objects to be pinned for the duration of the current sync execution. At least I would test.

not sure I am following but hints is all I am after … are you saying using WeakRef helps improving the situation around?

I am asking for an explicit hint intent provided to the ad-hoc FinalizationRegistry instance, like a GreedyFinalizationregistry primitive itself with identical API but less lazy collections, or something that could tell current register method that the observed object has higher priority in being collected than any other … either approaches would help the interoperability WASM / JS use case for more than just runtimes, so … can we think about it or have it in any shape or form?

It's unlikely any hints provided to the GC could take the shape of an explicit priority per object. GCs in general do not work that way: the only prioritization they do is based on the reachability level of the object, which includes things like how young it is, or whether ephemerons (WeakMaps) need to be traversed first.

I remember reading some suggestion for informing the GC system of the "related" amount of memory associated with some object, but that mostly falls in the same category of information the GC system isn't really capable of directly acting upon, but could possibly use as input in aggregate.

The only type of hint I expect engines might want to consider would be related to how much GC to perform (and how soon to perform it). Related size hints might be a decent input for that kind of trigger.

I was actually saying the opposite, that often WeakRef (in particular derefing them without using the target) make it more difficult to collect targets because it requires the engine to reveal the target to user code, possibly unnecessarily, and thus delay its collection.

I mentioned this specifically because often when using JS objects backed by wasm linear memory, if the JS object doesn't stay reachable long it might be expected to be quickly collected like other JS objects, but because of this implementation deficiency with FinalizationRegistry in v8, these objects would actually not be collected as part of the frequent young compaction collection cycle, and only be collected in the old mark-and-sweep cycle. I recommend you test GC behavior in a version of v8 with this fix implemented.

One big problem here is that the mapping necessary for linear memory backed objects isn't capable of collecting cycles[1]. The only solution there today is for the wasm code to use the new wasm gc types instead of linear memory. That's a much more complex change, but one most wasm compiled languages should adopt if possible. It would also resolve all the above memory pressure problems since the JS engine's gc would be solely responsible for doing collection.

[1]: It's simply impossible to collect cycles between wasm linear memory and JS memory, or any 2 independent gc, with just WeakRef/FinalizationRegistry because any objects imported on one side needs to be pinned by the other side. I have a design for an API allowing to implement collaborative distributed gc in user land, but I've been told there is little appetite to implement such complexity in existing garbage collectors.

fine … can we have that?

size as in … how big is the object on the other side of affairs? :thinking: … it might be irrelevant for cases where thousand of similar objects are created but relatively small then all short-living … not sure I understand this size constrain entirely, but if it could be faked, I’m OK with it …. because while we’re focusing on WASM as that’s the immediate proposal for FFI needs, having concurrent GCs from different worlds is also a topic: I can drive from a Web worker NodeJS/Bun/Deno in the same way a WASM targeting GC can drive JS on a thread and in there there are not magic WASM things to consider or that travel at all between ā€œworldsā€, yet knowing that my reference is not needed anymore and that the server-side counterpart should free that memory would be essential for long running programs based on such approach. 1 user is bearable, it’s the same RAM after all and the limit is the same machine, but for many clients to drive different ā€œsandboxesā€ on the server that cannot be kept as minimal as possible if all these clients won’t ever run GC because they feel like having plenty of RAM still out there is an issue.

In Bun we have the --smol flag to tell the GC it should be greedier, targeting more constrained machines, in JS I’d love to have a similar way at runtime (API or whatever) able to obtain similar results JavaScriptCore can with that flag passed along to Bun runtime … so there is also previous work, and that work would fit the bill (imho) here … is there literally anything at all for JS develoeprs that could work on the Web too?

I am not personally willing to champion that, see below for why.

Edit: I also think this kind of gc pressure hint API do not belong in the JS language itself, but more on the host side of things.

There are actually 2 uses cases described here with slightly different implications:

  • An agent in the same agent cluster (e.g. WebWorker, or iframe), where postMessage is available
  • An agent with which you communicate over a byte connection

In the former case, we could actually imagine a magic object/value that maintains its identity when round tripped over postMessage, and which could be used as a WeakMap key on both sides. In that case, the host would have the ability to perform gc between the agents sharing these values. This is something that actually doesn't require any new GC API surface, and is more similar to wasm gc types.

In the latter case, you do indeed need some more control over gc to effectively implement distributed gc over the binary connection. That distributed gc is inherently cooperative.

I think this is the crux of the concern I have with any approaches relying on gc pressure. FinalizationRegistry callbacks are best effort. Any gc pressure hint would similarly be best effort. As such a server shouldn't rely on a client performing and notifying of its gc promptly or at all. The server needs the ability to cut off clients that abuse its resources, or have a way to charge such clients for used resources.

Also as I mentioned, without cycle detection, the client is not always capable of releasing resources even if it is willing.

For cooperative distributed gc, I would only be interested in championing an API that handles the collection of distributed cycles. I have a draft of such a thing stashed away (a sort of extension of the WeakRef/FinalizationRegistry concepts), but I haven't brought it forward because I'm not optimistic any proposal to enable distributed gc use cases would be welcome by engine implementers.

You might want to take a look at the approach taken by Cap'n Web. It relies on explicit resource management to automatically dispose of stubs. It has the advantage of being fully deterministic, but does require user code to help with memory management (however the default behavior is pretty sensible)

I have a third one: a Worker that uses postMessage + Atomics to communicate synchronously with the main thread and in a bidirectional way, based on byte-encoded responses, but that also uses the main thread to forward via WebSockets and the same binary encoding/decoding to the server.

document.body would be a proxy to the main real document.body like any other thing that lives on the main thread is directly accessible and that’s the case also for the server, where you could server.import(ā€˜osā€˜) as example and call synchronously os.cpus() or actually do everything you want with real references living on the server, not just main thread.

In this case the Worker GC cycle is crucial to free on either main or server (or both) references when these are not needed anymore and while so far this technique works fine on the same machine (kiosk mode for the server side story + PyScript for the window one), the limit I have is that it won’t scale with multiple clients or multiple servers, assuming I could replicate the exact same dance via Python on the backend and direct worker access to that Python environment.

Non of this is hypothetical, it’s what coincident has been doing for years at this point and the reason I do care about the possibility to ā€œstressā€ more the GC at the Worker level so that both main thread related references and Pyodide/MicroPython one can be freed sooner than later.

Wouldn’t that help a lot also the WASM GC case? If there’s anything at all that could help collecting sooner I’d be very interested in that solution!

that’s all await based, coincident is a piece of cake compared to it, but I’ll try to have a look at the orchestration, maybe that’s something I can borrow for coincident, still the default Python FFI won’t probably use that technique, so anything native, instead of user-land implemneted, would be way more desirable.

As I said, the solution for wasm is to use wasm GC types and not linear memory. There will be no interest from engine implementers for anything else.

I believe these use cases would be covered by shared structs, which is moving forward in the wasm space from what I understand. Again, that likely means forgoing linear memory.

My proposal does not help make GC more prompt, it only enables collecting things like distributed cycles that cannot be collected today when dealing with fully independent GC.

Right, bindings for GC languages cannot rely on explicit resource management as it's not the semantics. The languages should switch to wasm GC types, that's what it was designed for.

I use SharedArrayBuffer already but that’s just the communication protocol/buffer, if I hold a main reference in the worker, that shared struct / buffer cannot be freed on the main thread until that reference is retained in the Worker … but maybe I need to use/play around with Shared Structs once these are usable, which I’ve no idea when that would be the case, still those are not possible for the WebSocket based communication as I believe I cannot share anything with the server (it’d be a dream if directly possible though but I don’t think it’s ever going to happen).

Anyway, it’s clear we have no way to improve the Python / JS FFI proposal with the current state of affairs, but I also think Pyodide is already using WASM GC types (which is the issue we had in Chrome/ium with exotic ojects throwing with a for/in loop, IIRC) still on the JS side it doesn’t know when it can collect / free those types.

Anyway, this conversation is exhaustive and public, so whoever has interest can chime in too, I’ll point at such conversation in the right venue, let’s see if there’s any hope we can improve the current status-quo.

Thanks.

The reality is that distributed gc is hard (there are a few academic papers on this), and while technically possible, the complexity is rarely justified to add support in local garbage collectors just for it.

GC is also a massive source of bugs, and in general the JS ecosystem has taken the approach of providing high level APIs like WeakMap and now FinalizationRegistry vs lower level primitives like ephemerons and destructors. For the same reason there is no standard explicit way to control the schedule or extent of GC in JS.

While being sympathetic to the various use cases requiring interaction with language gc, the reality is that the local use cases have different constraints than remote use cases: namely the former relies on sharing the same local resources, and the local engine is in a position to keep hiding garbage collection from the program altogether.

In the pure wasm language bindings case, wasm gc types were designed to solve this. It allows a language which has automatic gc to leverage the JS engine gc instead of implementing its own gc over linear memory (SharedArrayBuffer). This is all available in the major JS engines.

In the workers case (anything with a postMessage), I would love to convince Web Standards to introduce a value that round trips across agents preserving identity, and usable as WeakMap key. Some Shared Structs discussions were around whether to allow shared struct objects to be used as WeakMap keys, but the complexity this requires likely meant it wouldn't be allowed at first. However in the fullness of time, with either approach you would once again not need to observe the engine's gc and let it just work. Like SharedArrayBuffer, shared structs would also allow multiple workers to synchronize with Atomics, allowing the simulation of sync access. Unfortunately this is all in the future for now, which means we're stuck with FinalizationRegistry and suboptimal gc schedules for now (I know v8 started an experimental implementation, not sure of the progress lately)

In the remote / serialized connection use case, after having implemented remoting systems that rely on observing gc, I have actually switched my opinion and now believe that integrating with a gc you don't control is not sustainable (which is why I'm not pushing my proposal forward). In these cases, if possible, explicit resource management is a much safer approach.
However in our system we do want to keep some automatic gc. The approach I'm taking these days is to fully segregate the lifetime of my representative objects: the JS heap objects (called presences) I'm creating to represent remote objects are fully ephemeral and get revoked when the interaction has completed. To enable persistent interactions we actually have a local "virtual object" system which you can think of as a serialization of the object state including references to other objects. These references can include remote objects. Like for remote presences, these virtual objects get JS heap representatives allowing the program to interact with them, but these representatives get revoked as soon as the synchronous interaction completes. This creates a memory separation between "distributed objects" (which comprise virtual and remote objects) such that the local gc is responsible for collecting ephemeral representatives, and I can implement my own tracing gc over the distributed object system since I fully control the memory used to implement it (only virtual objects can be exported to be used remotely).

and that’s what Pyodide does already, it revokes ASAP references with some notable exception like promises or setTimeout, but this will never work for listeners, way too common in the JS side of affairs, so that Pyodide offers an explicit create_proxy utility that theoretically requires a destroy() call after but that’s the issue: we’re going in circles there, because people can’t observe objects lifecycle in JS so they would never know when such destroy() operation should be invoked.

A classic click listener that refers to a Python entry for a node that might get collected at any point in time is a great example, but the DOM doesn’t offer a way to know when such node is GC’d and ā€œuse custom elementsā€ is not a solution because clicks gotta work for any regular DOM node out there.

In these cases, Pyodide fallback to FinalizationRegistry as ā€œlast resortā€ to automatically destroy those proxies and free pressure/memory from the Python GC.

that’s coincident in a nutshell, locally it stores a remote pointer and register it, when the GC kicks in it calls the remote and tells it to drop that reference but identity will be preserved until that moment which is essential to be able to use libraries that landed on the main thread or on the server, otherwise it’d be a big mess of cross references and duplication.

For all these proxies we’d love to be able to ask GC to consider those before others but what I understood is that this ain’t gonna happen so there is no solution to improve the current state and avoid needing create_proxy which, to date, only caused issues, friction, and memory leaks (nobody destroys in practice those proxies or nobody knows when to destroy those when it comes to DOM listeners) so what supposes to be a solution is causing more troubles and making code and projects less robust :smiling_face_with_tear:

My description was probably not clear enough, but by using ephemeral representatives that are promptly and deterministically revoked, we functionally prevent any external references onto our distributed objects. That allows us to completely forego the need for any gc observations.

I understand however this is not an approach applicable to all systems. We have the luxury of only triggering execution of behavior attached to our distributed objects through interactions from other distributed objects. As you mentioned, integrating with the DOM or any code from the outside would require going back to pinning exports and gc observations to release these pins. However by disallowing edges from distributed objects onto heap objects in the outside world, you'd still prevent cycles and guarantee the ability to collect, albeit only as promptly as the host gc finds the exported garbage.

maybe a concrete minimal example would help me understanding better what you mean here … AFAIK I can’t tell a proxy to be ā€œheapā€ in JS and shapes don’t exist in our implementation, everything is a one-off proxy that, if already known, gonna return the same thing but, if not retained, gonna return a new thing if the GC meanwhile kicked in … we could also force-GC somehow by pausing via long/memory heavy loops (we are in workers, after all, it won’t block) but that feels overly-hacky and it still has zero guarantees.

What I don’t understand though, it’s why devtools has access to gc and can offer a collect garbage which is basically all we would need on our side to be able to trigger on occasion in either the main thread or the worker one … we won’t get to decide which garbage has been collected but the fact the pragmatism of that button bypasses all the theory behind ā€œGC should not be observableā€œ is pretty obvious to me.

to clarify, we don’t want to end up with something like this … but it’s unclear why we cannot tell GC to run sooner than later:

const collectGarbage = async () => {
  const sentinel = [Math.random()];
  const { usedJSHeapSize } = performance.memory;
  do {
    // add GC pressure
    sentinel.splice(0, 1, Math.random());
    await new Promise(resolve => setTimeout(resolve, 0));
  }
  while (usedJSHeapSize <= performance.memory.usedJSHeapSize);
  return -1 < sentinel[0];
};

edit it could be better orchestrated out of requestIdleCallback but the whole point remains … we have clear use cases and scenarios where telling GC to be a bit more aggressive/greedy is desirable because there are 3rd party dependent GCs targeting WASM, or living elsewhere outside the JS driven space, that really would like to be freed from their held references, whenever these are not needed anymore.

edit2 ā€œheckā€ a requestGCCallback would seal the deal to me, or any other similar variant of the same concept. We can trigger those on known proxies / foreign references and be sure we’re doing the same thing … at that point performance.memory can really go away from the platform (I’ve tested the alternative, it’s the slowest thing I’ve ever seen and it requires SharedArrayBuffer like constraints which are not aligned across browsers - see WebKit)

but it’s unclear why we cannot tell GC to run sooner than later:

I think it was in this talk: https://www.youtube.com/watch?v=Scxz6jVS4Ls that I heard: one reason that engines don't want to encourage manual runs of the GC is that it fights against the heuristics that they track. GC's can self-tune how often then run by looking at how long ago it was since they last ran and how many items were live/cleaned during the last sweep.

GC can’t do anything if it’s not aware of the nature of the reference … if it’s something held elsewhere it wouldn’t care or treat that reference any differently … this is the whole issue in a nutshell and while engines ā€œpreferā€ not to encourage, everyone on the server exposes GC for this or that reason and every devtool allows GC passes so this is looks more like patronizing than a reason to not let developers hint GCs what should be tracked more greedly or ask GC to be invoked because there are extremely valid reasons to do so that current status-quo cannot explicitly or implicitly address.

Because clicking ā€œCollect garbageā€ on devtools never broke a website neither, I am a bit skeptical of these reasons … we can tell the story ā€œGC blocksā€, ā€œGC invoked is bad for performanceā€ or ā€œdon’t GC becauseā€ … but there are tons of harakiri like APIs on the JS / Web ecosystem that ruling this one out when it’s actually needed because projects beyond ā€œregular constraintsā€ need it, feels just an unnecessary limitation to me, our project, everyone using WASM to do anything, others doing crazy reflected FFI stuff, workers using foreign PLs, and so on (a bit like eval … it’s bad, really bad, but there are inevitable use cases where it’s the only answer, see WASM evaluating thing internally, as example).

I hope this will be re-considered, discussed, and finally resolved because other PLs have GC abilities, it just happens the most used PL on this world can’t help less obvious use cases at all :cry:

btw, that video (and thanks for that) is from 2018 … I don’t recall WASM targeting PLs back then, Workers or SharedArrayBuffers used much, or at all, so it’s all great, but it’s not aging well to me.

edit … saw the whole thing, the point in there is ā€œGC nowā€ which is not what I am necessarily asking, as the gc() call when --expose-gc is passed in NodeJS, as example … what I have in mind is rather:

  • requestGCCallback which is following requestIdleCallback constraints and guarantees (none, beside a timeout to enforce that call to happen)
  • alternatively a Leaky primitive to indicate the GC not that the reference should be trashed ASAP, rather than it has dependencies and if it’s not collected soon enough it will result into memory leaks
  • alternatively a performance.memory.track(ref) able to put that ref in a state between new generation and old generation, so that passes will scan that reference before ending up scanning the whole old generation of possible garbage … that’s like a priority queue, so it could be even just placed on top of the old generation stack instead, don’t know if this makes sense but it does to me

Any of these ideas would help us reducing the amount of issues around memory leaks caused by JS but also that talk ends up stating that ā€œGC is not your enemy, is there to helpā€ and in here it’s not helping our case + ā€œmost developers don’t need to care about GCā€ so I guess I am, or our project is, part of the exclusion list considered at that time … I wish those sentences would actually adapt to ā€œbut if you really know what you are doing, here the thing that would make you happyā€œ which is not the case so far in here.