Native WeakValueMap

I think adding a native WeakValueMap to the standard would be useful for caching a large number of objects.

In modern runtimes, instantiating WeakRef has about the same performance as instantiating Maps, WeakMap, Sets, & WeakSets. When creating many WeakRefs, the overhead may be an issue. Given the number of ops is 1M to 10M per second. While this is reasonable in most cases these operations are orders of magnitude slower than map.get & map.set & creating an empty object.

Being able to instantiate a single WeakValueMap instead of n WeakRef objects can help in making client side libraries smaller & faster as well as server side apps handle more traffic.

Having a native WeakValueMap also reduces the need for a dependency when this behavior is needed.

To check: by 'WeakValueMap` this is referring to something like this: GitHub - tc39/proposal-weakrefs: WeakRefs

I expect it'd be more something like this: WeakValueMap.mjs · GitHub

While it's not straightforward to get right, I would be opposed to get things like that in the language because:

  • it's not obvious what the expected semantics are, and often times a bare WeakValueMap like this is not the right tool for the job.
  • I don't want new ways to observe GC (unless it's a fundamental new building block that can't be built from the existing features we already have)

I think a library is exactly where this kind of WeakValueMap belongs. And I doubt a native implementation would have much performance advantages over a library.

1 Like

In my experience, weak value maps are almost always the wrong thing to use. As values get dropped for non-use, you'll risk having keys recreated over and over, creating performance issues. Look at Cache replacement policies - Wikipedia instead for much better ways to do automatic cache eviction.

And for most caching, you should be using TTLs for size-independent automatic expiry, not weak values. If you genuinely have millions of keys and extremely frequent updates, use an eviction timestamp field (set to the performance.now() of the entry's creation) on each entry and a (monotone) priority queue sorted by that field. Maintain a single timer that repeatedly waits for Math.ceil(performance.now() - minDeadline) (rescheduling as necessary to target the most recent closest deadline), so you only walk one link each time. This keeps memory overhead down (~3 properties per entry) much better without sacrificing performance, and it'll likely outperform even a natively implemented WeakValueMap. Plus, you'll get way more cache hits.

1 Like

In my case, a TTL is not going work. I work on a library called rmemo (Reactive Memo). A signal library. Is uses WeakRef for the parent Signal to track if a child Signal is still live. It also enables the child Signal to be Garbage Collected.

In the case of rmemo, a singleton WeakValueMap would have a performance impact vs n WeakRef objects. The WeakValueMap.mjs example you presented also instantiates n WeakRef objects.

Instantiating a WeakRef is > 100x slower than instantiating an object. Which would be the key for the WeakValueMap. This would impact larger reactive graphs or high traffic servers that use reactive graphs. I don't have any production examples where this is a concern yet. Due to reactive graphs not being widely used on production servers. But with the upcoming Signals proposal & the recent popularity of Signals, it can become a pain point. It's something on my radar at least.

I filed an issue with the Signals Proposal if you want to chime in there.

I also want to note that rmemo is the reactive base of relementjs. A reactive dom library. SolidJS uses Signals as it's base. Most Signal libraries use explicit subscribe/unsubscribe bookkeeping semantics. But using WeakRef or WeakValueMap would reduce the amount of logic & bundle size needed to prune the reactive graph.

Which is one technique I used to make relementjs the smallest reactive component library...that I know of. It was originally forked from VanJS. Even though I extracted rmemo from the fork & made it reactive on the server side. VanJS requires a dom for reactivity. Using WeakRef instead of imperative cleanup logic was a big reason.

Another thing I just thought of. I believe that a WeakValueMap would allow primitives as keys. The reason why a WeakMap only allows object keys is that the object is allocated memory that can be Garbage Collected. It would allow primitive or object values.

A WeakValueMap is the opposite. It would allow primitive or object keys & require the value to be an object.

So in the case of rmemo, I would assign an incrementing numeric id for each memo. And use that memo id as the key for the WeakValueMap.

I benchmarked WeakRef and it's not good.

  • BunJS: WeakRef seems to work
  • NodeJS: WeakRef causes runtime performance degradation. It seems to write to disk?
  • DenoJS: WeakRef causes an OOM panic

I created another benchmark comparing WeakMap, FinalizationRegistry, WeakRef, & the WeakValueMap js implementation.

bun ./WeakValueMap.js
WeakMap x 33,849 ops/sec ±0.82% (95 runs sampled)
FinalizationRegistry x 21,919 ops/sec ±1.63% (93 runs sampled)
WeakRef x 68,214 ops/sec ±1.98% (91 runs sampled)
WeakValueMap x 7,224 ops/sec ±1.75% (86 runs sampled)
Fastest is WeakRef
node ./WeakValueMap.js
WeakMap x 20,373 ops/sec ±0.44% (96 runs sampled)
FinalizationRegistry x 13,921 ops/sec ±1.63% (88 runs sampled)
WeakRef x 1,517 ops/sec ±17.78% (19 runs sampled)
WeakValueMap x 722 ops/sec ±15.36% (81 runs sampled)
Fastest is WeakMap
deno run ./WeakValueMap.js
✅ Granted all read access.
WeakMap x 18,898 ops/sec ±0.52% (69 runs sampled)
FinalizationRegistry x 12,827 ops/sec ±2.11% (62 runs sampled)
WeakRef x 1,454 ops/sec ±21.63% (11 runs sampled)
WeakValueMap x 651 ops/sec ±14.85% (47 runs sampled)
Fastest is WeakMap

What's interesting about the Bunjs results. The WeakRef performed the best. Even faster than WeakMap. Presuming the BunJS implementation is the least buggy, it seems to verify that WeakRef is indeed the most primitive element. I don't know how each is implemented. But am I correct in thinking that WeakMap uses WeakRef in it's implementation?

I recently had to create my own implementation of a WeakValueMap sort of thing; mine is more of an IterableWeakSet, and the use case is basically for keeping track of event listeners without holding strong references to them. In this case the listeners are custom DOM elements, and they only need to know about a given event if they are still in the DOM tree - the event source shouldn't keep the DOM elements alive if they aren't on the page anymore.

My implementation doesn't use a FinalizationRegistry because the whole collection gets iterated through with some frequency, so I just drop the dead members whenever they are accessed.

I'm of two minds about the WeakValueMap idea. First, from a developer perspective, it feels weird that there are no primitives for holding an iterable collection of values without holding references to them. It feels like a feature gap, in the same way that the lack of a String.prototype.entries() function does. And because there is no native functionality that can accomplish this task, I have no guidelines for how I should implement my custom WeakCollection class in order to achieve best performance and avoid pitfalls.

From an implementation perspective, however, I have to wonder if there are any performance gains to be had. GC is obviously a difficult problem, and while @btakita's results are certainly suggestive that improvements could be made, that's not the same thing as actually modifying an engine to support the functionality. It also doesn't answer an important question, one which I think is central to most if not all discussions about potential GC-related features: "If an implementation were modified to support this feature, what performance impact would it have on code that doesn't use it?"

There's also a third consideration, one which @theScottyJam recently reminded me about: the features available in a language function as guideposts to the users of a language. If a WeakValueMap is added to the language, then people will use WeakValueMaps, and they will do so regardless of whether they are the best or most efficient tool for the job. For example, let's say the WeakValueMap described here were added to the language but it didn't end up offering any performance benefits over the polyfill. If I were writing my WeakCollection class in this hypothetical future, I would use the WeakValueMap as my primitive of choice rather than WeakRef, because I would assume that, as the language-provided feature closest to what I'm trying to do, it's the best tool for the job. I'd be wrong, though, because the map functionality is entirely unnecessary for my use case, and my assumption that a native feature would provide performance benefits over a hand-written one would, in this case, be incorrect.

What if we split the difference? Instead of offering a full-blown WeakValueMap in the language (which, as @mhofman notes, doesn't have obvious and self-evident semantics), how about we provide a primitive that can be used as a building block for libraries to use in providing their own WeakValueMaps or WeakCollections or whatever? All it needs is to provide the one thing that the current primitives don't: the ability to iterate over a collection of objects without holding strong references to them. Ideally it should do so without making assumptions about what kind of data structure is trying to use it as backing, so its API should be as small as possible. Forcing developers to use this as a building block rather than a proper container in its own right would send a strong message that this is not a recommended or necessarily performant pattern; I don't know off the top of my head whether the spec defines any abstract global classes, but it might be warranted in this case even if it has a strong enough API to be used on its own.

One possible implementation of such a structure-agnostic primitive might be the following:

class WeakObjectRegistry {
  #map = new Map()
  register(object) {
    const ref = new WeakRef(object);
    const sym = Symbol(object);
    #map.set(sym, ref);
    return sym;
  }
  deregister(symbol) {
    return #map.delete(symbol);
  }
  lookup(symbol) {
    const result = #map.get(symbol)?.deref();
    if (!result) #map.delete(symbol);
    return result;
  }
  registrationSymbols() {
    return #map.keys();
  }
  *[Symbol.iterator] () {
    for (const [sym, ref] of #map.entries()) {
      const obj = ref.deref();
      if (obj) yield obj;
      else #map.delete(sym);
    }
  }
}

Please forgive syntax and logic errors as I'm writing this on a phone, but I think the point comes across; this would work equally well as a base for a weak-valued map, collection, or even array. To ease the burden on implementations, the contract could be loosened as much as necessary:

  • registering an object twice could succeed and return the same symbol (WeakSet semantics), succeed and return a different symbol (WeakArray semantics), or fail (WeakMap-ish semantics).
  • The registrationSymbols() method could skip symbols attached to dead objects, or it could return them.
  • Iterating the registry could return a double-registered object once or twice.
  • Iteration order could be registration order, stable order, or unstable order.
  • Iteration could be restricted to registration symbols only, or even omitted altogether to make the registry only usable as a lookup (in which case, the wrapping class would store the symbols it cared about); my intuition is that there would be enough of a benefit to a native iteration method that discards dead refs that it ought to be included if possible, but I could see the argument that including it might edge across to the "attractive nuisance" side of things, if we really want to discourage developers from using this class without cause.

There are only two inviolable guarantees that would need to be respected for this to function as a building block:

  1. If a registered object is still alive, a strong reference to it can be obtained by calling lookup() on the symbol returned when it was registered.
  2. Deregistering a symbol causes future lookups on that symbol to fail.

If iteration is provided, I'd add the following guarantees:

  1. Iteration over registration symbols will yield at least one symbol for each live, registered object. It does not have to match the symbol originally returned at registration (but it probably will).
  2. Iteration over registered objects will yield every registered object at least once, unless (a) the given object has already been garbage collected, or (b) all that object's registration symbols have been deregistered.

Obviously stronger guarantees are better; for my WeakCollection, for example, since I do care about insertion order, I'd keep a private array of registration symbols and iterate over that rather than using the registry's iterator (if available), but if iteration order were guaranteed to match registration order I wouldn't bother.

What do folks think? For my part, I'd consider this a win even if only the first two guarantees could be met; what's more, if such a low-capability WeakObjectRegistry were presented in the spec, I as a developer would consider that a stronger recommendation against using weak-valued maps than the current lack of functionality.

I'm not sure these style of benchmarks are suitable for measuring APIs that deal with GC semantics.

A WeakMap for example does extra work if the map has been moved to the old generation and the inserted key is still in the young generation. Your benchmark creates a fresh WeakMap every time so it's only measuring weak refs among the young generation.

These APIs also influence how much work the GC needs to do.

Stress tests are more likely to capture this bigger picture.

Some videos you may find interesting:

1 Like

@aclaymore Given that your objections are the same as the ones I mentioned here:

do you have any thoughts on the reduction-in-scope I suggested?

I have some new info to share re: WeakRef performance. It turns out V8 & Spider Monkey do not GC the WeakRef in the young generation when run synchronously. This confused me & discussions I had on related issues. As I originally used synchronous benchmarks.

This code will eventually result in an OOM error in V8 & Spider Monkey. JSC does GC the WeakRef in the young generation so it's memory usage will be in a steady state.

while (true) {
  new WeakRef({})
}

There are open issues for V8 & Spider Monkey to GC the WeakRef in the young generation.

Are you sure it's the WeakRef itself that's not GCed and not just the target of the WeakRef. By spec, the target of a WeakRef cannot be synchronously collected after creation of the WeakRef, or deref being called. If any engines allow collection of the target, even if the WeakRef is itself collected, that's technically a spec non compliance. Arguably the spirit of the KeptObjects list is to allow stability of deref in synchronous code, so if the WeakRef instance is no longer reachable, there should be no harm in collecting the target itself, but that's not how the spec if currently written.

1 Like