FinalizationRegistry hint for GC pressure?

I just had a discussion with maintainers of WASM targeting PLs around the fact the unpredictability of the FinalizationRegistry is a concrete issue for them.

  • the WASM engine has a limited (and known) amount of buffer to deal with
  • the V8 (or others) engine has an "infinite" (hardware depending) amount of RAM to deal with
  • the V8 GC triggers potentially too late to help proxied referenced to be cleared in the WASM engine
  • there is no way to hint the FinalizationRegistry that some reference should be freed more periodically

The only similar discussion I've previously found is Exposing existing GC functionality with GC.criticalHint and GC.collectHint: pause hints for realtime JavaScript - #18 by claudiameadows but it's from 2021 and times the FinalizationRegistry API was (IIRC) non-existent.

Prior Art

As absurd or unrelated as it sounds, the AbortController mechanism on the Web, hence signals on a fetch operation, has already landed as specification.

That API grants the following:

  • only the ArbortController owner can dictate the faith of a fetch operation and drop it
  • the signal primitive to do so works privately behind the scene and it's a very well known reference that, once it's .abort() is called, breaks the current fetch operation

Because I understand that observability is an issue for GC but also because I believe that's actually desired in more convoluted cross worker, cross realm, or cross Programming Language and interpreters use cases, I find the AbortController related mechanism a great fit for the idea I am going to propose.

Hinted GC via FinalizationRegistry

The idea is that the current constructor accepts a callback as unique field in its constructor and it throws when such parameter/field is not a callback ... there's room to play here without breaking the world as that's usually how new proposal needs to land ... so that even if I think new FinalizationRegistry(callback, gcHintSignal) would be ideal, or even new FinalizationRegistry(callback, {signal}) to mimic an already understood API out there, we can have without issues a new FinalizationRegistry({ signal, invoke }) (invoke or callback or handleEvent or whatever) that would never cause unexpected issues.

The new signature would look like this:

const signal = new GCHint(); // or GCController
const fr = new FinalizationRegistry({
  callback() { console.log('triggered'); },
  signal // or hint
});

After that, whenever is needed, desired or whatnot:

signal.hint()

The result of that operation is that any reference registered via that fr registry would have an easier life to be freed from the queue of stuff that might eventually be freed in the future and, if that reference is gone, the registry can invoke the related callback.

As Summary

  • this idea won't expose to anyone easy observability as nobody can know if a signal has ever be passed to a new FinalizationRegistry (as well as not necessarily everyone has a held reference to observe)
  • the moment any reference passes through more than one FinalizationRegistry the first signal that invokes .hint() simply triggers other callbacks too around that object ... after all, anything devs need to know is that such reference is gone for good
  • every hint() call is meant to be greedy and blocking but that's what GC does anyway when it collects and kicks callbacks
  • WASM and other use cases heavily GC related can at least hint() periodically while running, hoping to have more memory freed sooner than later
  • nobody else that used FinalizationRegistry to date would care or would be affected by this new possibility, actually they might welcome more incremental releases (as side-effect of this proposal)
  • last but not least, engines are free to ignore the signal.hint() call without affecting those engines that actually would instead care about it ... so it still can be written that hint() doesn't guarantee anything at all, it's just, literally, a hint for the GC
  • the hint() in the GC would result in it running sooner than later, no major architectural changes needed around the complexity of the GC neither (speculative, I am just guessing here)

I think that's it from my side, and I am looking forward to hear from you, TC39 members.

Related: GitHub - tc39/proposal-cleanup-some: Proposal to migrate cleanup some to its own proposal repository

Though I think that repo needs updating, as I think the proposal is now withdrawn rather than stage 2.

Interesting, thanks, although here the proposal is pretty different, as it's not that people randomly call that prototype method hoping for something to happen, the registry with a signal and the signal owner is the only one that can eventually do something on hint() invokes and you need to have either held weak refs or keys in scope to then deal with releases when that happens.

But I believe most of this has been discussed, yet even MDN states to just use timers to have earlier GC calls but then again, neither timers nor requestIdleCallback really help with the finalization registry, it's extremely engine and browser dependent, or even env (i.e. on SBC with low RAM is way more greedy and yet still unpredictable).

Does anyone think it's worth considering this topic at all, presented in a way that was never discussed before?

I'm confused, do you expect hint() to trigger GC, or to trigger pending callbacks for things that were already collected but not yet notified?

If the latter, I believe that's what cleanupSome was doing.

If the former, I don't understand why it would be a feature of Finalization registry instances. GC cannot find whether something registered with the registry is not referenced until it has scanned the whole heap (traced all the roots). As such GC is an agent wide operation. If you want the semantics to be blocking, that means performing a full forced GC (which some hosts expose as a global API).

Regarding the use case, from what I understand this feels like something where wasm GC should help since the wasm environment wouldn't be forced to use linear memory to represent its own types, and could rely on the GC provided by the host. At least for languages with automatic memory management.

In general, it's always a problem mixing 2 memory management systems. You almost always end up with staged collections, which can be very problematic if you have a lot of cross references between the 2 systems, and cause memory leaks if you have cycles between them.

the conversation was around this ... they can't fine-tune references that need to be collected and the GC triggers way too unpredictably to help their cause.

If I unerstand you correctly tough, the GC is a "scan the world or forget about it" mechanism ... at least what you wrote gave me such hint.

In this case/proposal, the FinalizationRegistry could tell the GC to not necessarily scan everything, just scan, or update, reference counting for the references that have been registered through the one that provided the GCHint signal.

If this is not possible at all, I agree there's no feature usable, or to land, in my proposal, but I wonder it having a separate, "priority queue" for those references observed via a GCHint in the mix, could ever work, or be somehow useful in general.

If NO is the answer, I am OK with it, but as I've heard about incremental GCs solutions, I still wonder if that NO would last forever, thanks.

Incremental GC just means you don't have to scan the whole word at once, but you still have to scan the whole world to be sure no reference exists. There are ways to compartmentalize things, which is what generational GC does, but that requires keeping track of where references across the compartments live (effectively creating roots for the compartment)

In the wasmgc approach, you have to implement your foreign PL using "wasm objects" which can be referenced by JS objects and hold references to JS objects. In that world, the PL does not do its own allocations using linear memory. As such it shouldn't care about when the allocated objects are collected by the existing GC. Of course you cannot run destructors, and similarly have to rely on finalizers.

That's, I believe the big caveat ... they need to eventually chain internally the destructor happening, or trust the FinalizationRegistry callback happens to do so ... if I am misunderstanding you and latter case works already, I will point them to this but if that's not the case, they will be in trouble ... if a local ref is held in the outer JS world (or vice-versa, really) and other refs point or depend on that ref, not having a finalization-registry invoke means other obejcts can't be collected ... am I reading this right? I hope no!

P.S. as much as this idea was for WASM related PLs, I have my counter-example that live across client/server boundaries ... if I ask the client to hold and notify a reference from the server is not needed anymore, I want that to trigger ASAP or the foreign server could saturate its memory by holding references from each client forever so that this idea was not only WASM related, it's also about cross realm and/or environment.

For other PL running in wasm in the same agent, the goal of wasmgc is for the other PLs to rely on the VM's memory management, and not have the PL do any of its own allocations.

If the PL exposes a destructors concept, then that cannot be implemented. If the PL has a finalizer concept, this would run with the same semantics as JS.

Yes optimizing separate GC is a very complex problem. There are actually missing GC capabilities in all managed languages to do that more efficiently. I have an idea of a proposal to help user land synchronize such distributed GC, especially enable collection of distributed cycles, but it's stuck at the rocket science exploration phase.

2 Likes

anything anyone could do to help you forward with such proposal?

something where wasm GC should help

Wasm GC would certainly fix this problem if we could use it. Unfortunately we have a lot of existing C code which cannot easily be used with wasm gc. The trouble is that gc'd types and linear memory pointers don't mix that well -- a linear memory C struct cannot contain a gc'd type directly, it has to instead contain a table index. The table is a gc root and I believe that you reintroduce the same set of problems.

I am very much looking forward to being able to touch gc types from llvm. For comparison, we got externref support in C in this commit:

You almost always end up with staged collections, which can be very problematic if you have a lot of cross references between the 2 systems, and cause memory leaks if you have cycles between them.

I agree that it is impossible to fix this -- the only good way out is to do something like wasm-gc where there is only one GC that understand the whole world. It would be nice to have a few more bandaids for cases where wasm gc is not usable (since unfortunately I think this includes most existing C code).

Basically the question here is suppose I have something like the following:

const registry = new FinalizationRegistry(({ptr} => Module._free(ptr));
const ptr = Module._malloc(size);
const obj = {ptr};
FinalizationRegistry.registry(obj);

In this case, the obj owns size many bytes of wasm linear memory in addition to the size that v8 sees. It would be nice to be able to advise the v8 garbage collector that finalizing obj frees up size more bytes than it otherwise seems like.

2 Likes

Incidentally, if anyone is looking for an entertaining read, here's a discussion of some of the difficulties that were involved in adding externref support to clang:

It's not the only solution, but the alternative is cooperative distributed GC, which has prior art, but would require some more insight from the independent GCs, namely the ability to provide summarized exclusive retention information from export roots to import leafs in each system.

It does still require every participating system to be able to trace their retention graph, so in languages like C I'm not sure how that would work.

More precisely, if you have some linear memory with an exported JS object pointing to an address in that linear memory, internal references inside the linear memory, then an exit from that linear history (an index for your table of imported extern refs), you need to be able to trace through the linear memory internal references which exported JS objects can ultimately reach which index exits.

In this specific case, since you only have 2 parties in your system, you don't strictly need the JS side to be able to provide the tracing information. You can instead create direct references from the exported JS object to the imported extern refs, simulating what retention paths exists within the linear memory. Then your table of exits would not directly hold the extern refs, but WeakRefs to them. At that point you can rely on the finalization of the exported JS objects to clean up your linear memory. Note that this requires all references inside the linear memory to be rooted in a JS object. You can always use a "linear root" JS object which while not actually exported to user code, represents when a root in the linear memory can reach an imported extern ref.

I remember seeing a suggestion along those lines before. I am actually not sure if any JS GC actually cares about the retained size of objects when collecting garbage.

"funny" enough, that's basically what I've suggested and implemented and it works indeed ... the WASM interpreter just needs to wait for the FinalizationRegistry to .destroy() explicitly retained refs behind WeakRefs and it seems to work pretty well too.

However, it's still true that it's impossible to predict if the WASM side would fill up its RAM/buffer because there's no priority or rush on the JS side to free those references, which brings me back to the original idea of having prioritized refs to clean up sooner than others, or find a way to actually hint the GC that it's about the time to run.

We have requestIdleCallback with a timeout that doesn't indicate when it will run but it helps setting a deadline in triggering that function ... if we had anything similar for the GC I think most of our problems would be solved.

Correct, it doesn't influence when the JS GC runs. It only helps with clearing more things at once (including cycles), instead of staged collections and leaks.

Maybe the problem is that JS GC likely considers linear memory and JS heap as distinct. I'm wondering if we could use the new resizable array buffer and way to indicate/hint that some array buffers should be considered like heap. That way when the linear memory usage grows, it would count towards whatever metric the JS GC uses to decide some collection is needed.

I guess it's similar to @hoodmane's suggestion to hint how much each JS object actually holds in memory, but that seems like an accounting nightmare when considering transitive references.

wouldn't that be too late though? I am not sure how linear memory works in Pyodide specific but I think there's no "defrag" happening there ... I am thinking about the fact when the buffer needs to grow is because it's already filled so that having freed memory after it needed to grow would basically make it just "more holey" ... unless the engine is smart enough to sum-up and sum-down needed bytes and also shrink with ease ... still, the order of events would be:

  • filled buffer, gotta grow it
  • the GC thinks "oh, look! some linear memory is growing, let's run and maybe free stuff for it"
  • the grown buffer now might be too large for no reasons

This, unless I fully misunderstood everything, which is very well possible too.

How about a way to hint at least how greedy and/or frequent the GC should kick in?

// completely made up idea and names
// a static / shared global utility to hint greedier GC
FinalizationRegistry.hintGCFrequency = "greedy";
FinalizationRegistry.hintGCFrequency = "ondemand";
FinalizationRegistry.hintGCFrequency = "performance";
  • greedy will make GC run more frequently than usual
  • ondemand will basically be the default state (as it is now)
  • performance will basically make it the least greedy possible so that if there is enough RAM and no leaks it could actually never run at all

Has this, or anything similar, ever been considered or is it even possible? :thinking:

I suspect any direct configuration knob is going to run into pushback.

I agree that a reactive system relying on pressure information from after you've already reached the ceiling is not optimal. I still believe we need to find a solution to make the garbage manager understand that linear memory is reaching a pressure point.

However there would be no guarantee that GC will trigger before that linear memory reaches its limit, nor that it will notify about garbage found immediately after running. And there will be pushback from implementers about any API adding promptness guarantees (this was a contentious point during the original proposal).

I'm really not sure what approach can be taken here.