Exposing existing GC functionality with GC.criticalHint and GC.collectHint: pause hints for realtime JavaScript

CrazyPython · March 18, 2021, 6:35pm

Prior Art

V8 implements "idle tasks" which can be accessed through its C++ API. Idle tasks are optional (non-urgent) GC work that will run eventually. Blink uses idle tasks to schedule work between frames, after JavaScript execution is finished.
https://v8.dev/blog/trash-talk#idle-time-gc

JavaScriptCore uses "synthetic pauses" for its concurrent collector. They do not need to be run immediately and are based on engine heuristics.

SpiderMonkey has incremental marking which is done in small arbitrary pauses and incremental sweeping which is done in pauses proportional to the zone size.
https://searchfox.org/mozilla-central/source/js/src/gc/GC.cpp#104

Summary: Extant ECMAScript implementations have internal mechanisms for controlling scheduling and duration of short GC pauses.

Proposal

JavaScript could benefit from fine-grained control of garbage-collection timing. Realtime applications often need to deliver a frame otherwise the user experience will suffer.

I propose a global, Realtime, with two static methods:

Realtime.avoidPausingFor(period) - Hints the runtime to avoid long-pausing the current JS worker/thread between now and now plus period milliseconds.

Realtime.canPauseFor(period) alternate name idlePeriodFor(period) - Hints the runtime that it can pause the current JS thread from now to now plus puration milliseconds. The call signals that the user experience will not be degraded if the ECMAScript engine pauses for part or all of the duration.

Use cases for `canPauseFor`

Preferring to run garbage collection pauses during low-action segments of a game, such as on an interstitial screen, or when there are no nearby enemies, instead of running it while the player is battling enemies
Running GC when a user is not interacting with an article
Running GC while the application is transitioning to a new state (and thus lots of memory is becoming garbage) and cannot respond to user input
Running GC work of a Node.js game server that runs at 30FPS in between frames

Use cases for `avoidPausingFor`

Signaling a web worker that the main thread offloads animation or graphics computation to should run GC only after the current frame is done, to allow normal web workers to have independent collections
Avoiding collecting garbage while serving an HTTP request from Node.js to prevent end-user latency from increasing

Things to bikeshed on

canPauseFor does not mandate triggering a GC cycle the same way System.gc in Java might. It signals an opportunity to do pauses. The spec text would not be normative upon the behavior of WeakRef and FinalizationRegistry .
avoidPausingFor for implementations should request new memory from the operating system if it is required to fulfill the request, up to the security restrictions of the host environment. However, GC may still run is necessary, runtimes are not asked to support a "never-fail allocator" or "emergency allocator."
A long-pause is an implementation-defined duration that varies between garbage collector implementations. However, it is understood to be less than 1 cumulative millisecond.
canPauseFor overrides any previous avoidPausingFor directive. This is to allow the developer to ask the implementation to avoid pausing for the rest of the frame, then when the frame is complete, allowing the implementation to pause for the remaining frame time. An avoidPausingFor call does not override an earlier canPauseFor call. This restriction may be lifted in a future spec if necessary
The host environment may restrict, ignore, set minimums, and/or set maximums on the Realtime calls. In other words, the host has ultimate control over GC scheduling and Realtime calls are an implementation hint.
This spec may not need avoidPausingFor. The canPauseFor hint to run some GC work may be enough. On the other hand, it may have meaning on single-threaded processors (common in cloud servers) or concurrent GCs: Avoid running a GC operating system thread in parallel, wait until the period has expired to do that. It may be dropped from the proposal as it moves towards Stage 3 and implementation experience is gained. It is suggested engines implement canPauseFor and gather real-world performance data in origin trials and synthetic benchmarks before deciding whether avoidPausingFor should be removed from the spec.

Future work

A future version of the spec may specify a way to define a Realm's maximum pause length. For instance, it may be possible to spawn a Realm with a 0 to activate the engine's lowest latency collector, such as reference-counting cycle collector.

CrazyPython · March 18, 2021, 6:40pm

I am looking for a champion and criticism of my proposal. Let me know if I posted this in the wrong forum category.

theScottyJam · March 18, 2021, 8:12pm

This seems like an interesting idea. I guess those who know the internals of javascript garbage collecting better can give better feedback to the real value of this. But I can help with bikeshedding :).

It seems difficult in most use cases to predict how long you're ok having the garbage collector be busy (with canPauseFor(period)). I'm wondering if it would make more sense to have this be a toggleable state.

e.g.

RealTime.encorageGarbageCleanup = true
// ...
RealTime.encorageGarbageCleanup = false

Also, what happens when the timespans of canPauseFor(period) and avoidPausingFor(period) overlap? Does the most recent call take precedence?

CrazyPython · March 20, 2021, 11:28pm

It seems difficult in most use cases to predict how long you're ok having the garbage collector be busy

I am running my Node.js server-side at 30FPS.1000 milliseconds (ms) / 30 = 33ms per frame. I finished calculating the current frame in 22ms. 32 - 22 = 10. Therefore, Realtime.canPauseFor(10), as I have 10 ms remaining.

Another usage scenario. During an application transition- imagine your IDE booting up- the collector can pause for am arbitrarily long amount of time, because the user interface is not doing useful work anyway. Many IDEs have accurate loading progress bars, they can predict the amount of loading time and allow the GC to pause based on that.

Also, what happens when the timespans of canPauseFor(period) and avoidPausingFor(period) overlap? Does the most recent call take precedence?

canPauseFor overrides any previous avoidPausingFor directive. This is to allow the developer to ask the implementation to avoid pausing for the rest of the frame, then when the frame is complete, allowing the implementation to pause for the remaining frame time. An avoidPausingFor call does not override an earlier canPauseFor call.

CrazyPython · March 20, 2021, 11:32pm

I would like to be more specific about the definitions:

avoidPausingFor asks the GC to delay even urgent GC work, even if it means allocating additional memory from the operating system

canPauseFor asks the GC to do necessary GC work if it can be done within the allotted period

It should not trigger GC work that wouldn't've been otherwise done. It merely signals a GC opportunity.

CrazyPython · March 20, 2021, 11:34pm

This is what I had in mind when I wrote the proposal:

e.g.

However, for JavaScript frameworks like SPAs, what you proposed might make sense too;

RealTime.encorageGarbageCleanup = true
// ...
RealTime.encorageGarbageCleanup = false

However, I see a problem with this. If an exception throws within the asynchronous loop or asynchronous function that turns RealTime.encorageGarbageCleanup = false, the application could potentially be forever stuck in RealTime.encorageGarbageCleanup = true, leading to a janky user experience. canPauseFor always expires.

theScottyJam · March 21, 2021, 2:15am

That's a fair point.

Maybe it would have been better if I defined that as a function. Here's another rough idea:

const token = RealTime.encorageGarbageCleanup({ timeout: 1000 }) // timeout is optional
// ...
token.cancel() // Stop the timeout early

This could cover both scenarios. A timeout can be provided when desired, but this can also be on-off switch if that works better. What's more, these cancel tokens can stack (making this API not a single global switch anymore). Different places in the codebase can call RealTime.encorageGarbageCleanup(), and the "encouraging" won't stop until all tokens canel or timeout.

CrazyPython · March 21, 2021, 8:26pm

I just signed the TC39 ECMAScript RFTG Contributor Form.

rdking · March 26, 2021, 8:44pm

Might it not be better for the function to be RealTime.discourageGarbageCleanup(timeout)? I'm thinking that the garbage collector in any given engine is about as optimal as the developers can manage in any given release. Unless I'm missing something, the goal here is to keep the garbage collector from kicking in and stalling frame generation. If that's the case, then discouraging the presumed already optimal GC from doing any work virtually ensures that when the timeout is either cancelled or expires, the GC will already have work to do, and will fit in as much of it as possible before the next discouraging call.

claudiameadows · March 28, 2021, 7:02am

Have you considered an API like this?

// Execute code with hint to avoid GC.
const result = GC.criticalHint(level = "none" | "major" | "minor", () => {
  // ...
  return result
})

// Hint to collect GC
await GC.collectHint(level = "major" | "minor")

The hint would be one of three variants:

"major" = avoid/perform major GC runs
"minor" = avoid/perform minor GC runs
"none" = non-critical (doesn't apply to collectHint)

For objects allocated in critical sections, collection should generally be deferred until after the critical section completes. If they're allocated in non-critical sections within critical sections, they should be collected as normal.

This would align better with how GCs are actually constructed, and would also allow you to better specify what level of performance tradeoffs you're willing to accept.

"minor" for the critical hint is the most invasive and most potentially destructive of performance. "major" with occasional "none"s for lifecycle hooks/methods would be handy for some DOM framework rendering, though, to delay the scavenge pass for all the internal virtual DOM nodes as late as pragmatically possible. (That's also why I have my suggestion this way. The idea is similar to bump allocation, but without sacrificing the perf boost you could get with using only younger generations.)
"major" for the collect hint is the most invasive and most potentially destructive of performance, but it's useful in its own right in cases where you're doing a lot of big data manipulation and don't want sudden GC spikes in the middle of processing it - I've a few times used setTimeout(func, 4) as a similar hint, but I'd strongly prefer to rely on something a little more deterministic. "minor" might be useful for some frameworks to make performance more predictable.
collectHint is asynchronous because GCs run concurrently, and blocking the main thread for GC is almost always a bad thing. This also avoids questions on how this would interact with weak refs - it's already spec'd that those don't die until after the end of the current promise job, and so it'd just glide right in with that without issue.

CrazyPython · March 28, 2021, 10:14pm

I think your API is better and should replace mine. It has many good points and also fits modern GCs better.

However, it underspecifies how long the promise returned by GC.collectHint pauses for.

Does it pause for the minimum? If so, what is a minimum? 1ms? 0.5ms?
Does it pause to collect until all work is done? That could lead to unpredictable pauses.
Does it trigger a global GC? That could be inefficient.

I propose the following:

await GC.collectHint(level = "major" | "minor", maxMainThreadPause)

JavaScriptCore and V8 both can pause for an arbitrary amount of time to run partial GC collection work. Without additional hints, there is no way to say "pause for a specified duration before the next frame/network request/etc." I use Node on the server, and I think this would be useful.

After a maximum of maxMainThreadPause milliseconds, the promise returned by collectHint should resolve.

Two things to bikeshed on:
You wrote "is the most invasive and most potentially destructive of performance," twice. What did you mean to say instead?

If they're allocated in non-critical sections within critical sections, they should be collected as normal.

Can you clarify what you mean? What is a non-critical section? (An external function call?)

setTimeout(func, 4)

Could you link me to documentation on this implementation-specific hack? Could be listed as prior art.

claudiameadows · March 30, 2021, 9:55pm

For your first two questions, it's up to the implementation what to do. It's a hint, not a guarantee - keep this in mind. Additionally, it's asynchronous, so the idea is the engine would resolve once it considers the request fulfilled - this is likewise extremely implementation-dependent. One could specify a max timeout for the collect hint, but even then, it should be honored as a preference, not an absolute requirement. And BTW, this doesn't "pause" the main thread - additional work can still run concurrently, including new promises, timers, and such, in theory, if the host chooses to split up its idle time.

To go into a little more detail, V8 has this hook where embedders can set a flag going "we're idle now, do whatever you need". This function could be implemented by exposing to embedders that "hey, we've got a GC request we're about to execute, and we'd like to ask you to budget idle time for us based on this request and notify us if/when you consider this request fulfilled" and letting them decide how to handle it, along with another hook going "hey, we've completed this request ourselves, you can resolve this request whenever you're ready".

For the third, it's necessarily global. Engines don't have the mechanisms for only collecting objects from a specific scope, and even my criticalHint might prove difficult to implement as desired.

This was relative to each hint. Sorry if that wasn't clear. (This is a very highly technical thing to spec out, after all.)

Sorry if it wasn't clear. I tried to explain it here (emphasis added).

The idea is this:

// A
GC.criticalHint("major", () => {
  // B
  GC.criticalHint("none", () => {
    // C
  })
  // D
})
// E

Non-critical means it can just use its default mechanisms to determine GC behavior, and this is how it starts out - A and E are in non-critical sections.
When you enter a critical section that restricts major collections, major GCs should be avoided - this applies to B and D.
When you enter a critical section that restricts minor collections, minor and major GCs should be avoided - doesn't apply here. (The nursery is unaffected as collection is virtually zero cost.)

It's not so much an implementation-specific hack as much as a general timings hack that budgets 4ms for collection, and the engine of course goes idle for that. This isn't specific to any particular engine.

There is of course prior art, though:

Can I trigger JavaScript's garbage collection? - Stack Overflow shows both interest and a couple older browsers' hooks in one of the answers.
V8's --expose-gc: Forcing Garbage Collection in node.js and JavaScript • Computer Science and Machine Learning

CrazyPython · March 30, 2021, 10:59pm

For your first two questions, it's up to the implementation what to do.

V8's algorithm requires pausing the main thread for finalization work. It is capable of pausing for a specific amount of time:
"As soon as an incremental major garbage collection is started, V8 posts an idle task to Chrome's task scheduler, which will perform incremental marking steps. These steps can be linearly scaled by the number of bytes that should be marked. Based on the average measured marking speed, the idle task tries to fit as much marking work as possible into the given idle time." https://queue.acm.org/detail.cfm?id=2977741

It's reasonable to add a pause time parameter as a hint for GC algorithms that support it. Likewise, engines that do not support GC.criticalHint("none") will fallback to the level it does support, GC.criticalHint("minor"), whereas an engine like Spidermonkey capable of suppression will suppress it.

One could specify a max timeout for the collect hint, but even then, it should be honored as a preference, not an absolute requirement.

I agree.

And BTW, this doesn't "pause" the main thread

Yes, this API doesn't pause anything. Many GC concurrent algorithms have a phase where the main thread is paused to finalize a GC cycle- JavaScriptCore calls it synthetic pauses, V8 calls it idle tasks. These are the "main thread pauses" I'm referring to. As a web game developer, I'm extremely wary of these. I would like to keep them running only after the game is done executing for a cycle, and only for a limited time– which modern GCs are capable of. And unless the runtime intercepts setInterval, it has no way of knowing when it can run its GC.

claudiameadows · April 1, 2021, 9:09pm

Where do you see that in V8's API? IIUC V8 does that not by actually watching a timer (that's done on Chrome's end), but by monitoring a boolean that's set and later unset.

CrazyPython · April 2, 2021, 10:46pm

Where do you see that in V8's API?

V8 8.6, Node.js 15:

bool IdleNotificationDeadline ( double deadline_in_seconds )

Optional notification that the embedder is idle. V8 uses the notification to perform garbage collection. This call can be used repeatedly if the embedder remains idle. Returns true if the embedder should stop calling IdleNotificationDeadline until real work has been done. This indicates that V8 has done as much cleanup as it will be able to do.

The deadline_in_seconds argument specifies the deadline V8 has to finish garbage collection work. deadline_in_seconds is compared with MonotonicallyIncreasingTime() and should be based on the same timebase as that function. There is no guarantee that the actual work will be done within the time limit.

Source: v8: Isolate Class Reference

I want this proposal to integrate nicely with existing engines in a well-specified way without any handwavium. That's why I thought about existing engine's algorithms before proposing this.

claudiameadows · April 6, 2021, 12:31am

Oh, okay. Doesn't ultimately change my suggestion, though - the actual idle time available might be less if there's timers to be invoked during that time, and so Chrome or Node in this case would have to break up the scheduled GC time into sections to accommodate that.

CrazyPython · April 6, 2021, 10:20pm

That makes sense, giving the host/engine control over how the event loop interacts with GC.

By the way, how can I get a champion? Is there a list of potential champions? (Is there one interested in realtime applications or garbage collection?) Or do I just wait here?

claudiameadows · April 6, 2021, 11:58pm

You literally just ask around to see if anyone's interested.

Topic		Replies	Views
FinalizationRegistry hint for GC pressure? 💡 Ideas proposal	24	475	January 25, 2024
Liveness barriers and finalization I have questions	30	1775	December 21, 2021
Timeout for an async loop: if loop do not finishes before timeout, it will break anyway. 💡 Ideas	31	2392	October 26, 2021
Built-in function memoization with weak references 💡 Ideas proposal	17	1829	September 29, 2022
Detect Strict Mode 💡 Ideas	25	424	December 15, 2022