Allow weak maps to contain primitives

I know this is a bit bold, but I have legitimate use cases for weakly storing primitive strings. Things like this cache (it'll become a Map once we drop IE support) are wasteful for dynamically built strings that aren't frequently used, but at the same time, static strings have to be there. Additionally, Lodash's _.memoize has over 10M weekly downloads, and its underlying MapCache could be made considerably more memory-efficient by such a generalized weak map.

2 Likes

What would the spec say to do in the case of holding a weak ref to a primitive?

const wm = new WeakMap();
wm.set(“abc”, [1, 2, 3]);

// 1
wm.get(“ab” + “c”);

await Promise.resolve();
// 2
wm.get(“ab” + “c”);

… // many moons pass
// 3
wm.get(“ab” + “c”);

At what point is the entry for “abc” collected? Or will it be there forever?

it would be there forever. and that's ok.

I'd have those keys carry the same minimum liveness requirements that weak refs have with their inner object values. And as usual, engines could choose to simply leave them there forever, as that's a valid value collection strategy.

It wouldn’t be a choice though right? They must leave the entry there forever?

Right, with the benefit being people can quickly reach for a built in class instead of adding a dependency or writing this them selves?

class WeakishMap {
  #m = new Map;
  #w = new WeakMap;

  set(k, v) {
    try {
      this.#w.set(k, v);
    } catch {
      this.#m.set(k, v)
    }
  }

  get(k) {
    return this.#w.get(k) ?? this.#m.get(k);
  }

  delete(k) {
    return this.#w.delete(k) || this.#m.delete(k);
  }
}

No, it'd be implementation-defined for my particular suggestion. (Part of why I said the suggestion was a bit bold. :wink:)

I'm open to it being there forever, but I wouldn't consider that ideal by any means.

1 Like

I think that would be a tough ask, as it makes it harder to write portable code if the spec leaves the semantics as undefined behavior.

That's also why I clarified the minimum liveness in a follow-up by linking it to weak refs' objects. That's the portability guarantee - they should be approximately as portable as those. And I stopped short of including other non-symbol primitives like numbers and booleans (which, were they to be included, I would agree with @ljharb - V8 stores all of that on the heap, while SpiderMonkey NaN-boxes those, unlike with strings where virtually everyone uses the heap out of necessity).

Even if I create two identical strings the engines won’t necessarily immediately inturn those two values into the same heap object.
So when I drop a reference to the first one, that object may no longer be live but the identical one is.

WeakRefs storing objects has simpler semantics to define because objects are not forgeable. Two different js object values are never equal.

To clarify: if a WeakRef has dropped its ref then that is a guarantee that the object it was pointing to can no longer suddenly re-appear.

1 Like

Beyond the obvious problems for reasoning about programs this would cause, there's a somewhat more subtle information leak. Consider:

let map = new WeakMap;
map.set("some string", {});
let registry = new FinalizationRegistry(() => console.log('collected'));
registry.register(map.get("some string"));

If "some string" is in use in some other part of the program at the time this runs, and the engine happens to intern both instances, this will print collected when the other part of the program finishes using the string.

Several members of the committee are strong opposed to any sort of spooky action at a distance like this. There is basically no chance anything which allows this will get through committee.

(Note that this is a significantly worse problem than with the current state of affairs, because currently using this communications channel requires both parts of the program to share a non-primitive object, which means they already have a perfectly good communications channel. The above snippet allows one part of the program to observe another without them ever having communicated.)

Wouldn’t that only be the case if the string was collected, which presumably it never would be?

It could get collected when you're dealing with dynamically created strings ad apposed to string literals

1 Like

No it couldn't - primitives never get collected. How they're created doesn't matter - 1 + 1 and 2 are the same uncollectible primitive.

Strings do get collected internally to prevent memory leaks - stuff like this only ever work because strings can be collected.

As I noted, this doesn't hold true for all types, though - bigints can be optimized into 64-bit integers in V8, doubles are allocated in V8 but never in 64-bit SpiderMonkey (which uses NaN boxing for pointers instead), and so on.

I have since come up with a potential compatibility concern, though: cons strings. "ab" + "c" and "a" + "bc" are ===, but when not constant-folded will not carry the same internal reference in any major engine. (I'm unaware of any engine not targeting low-resource embedded that doesn't employ this easy optimization.) So I'll revert back to my fallback of simply allowing primitives to be used with the expectation they will never be collected as they lack any reference to make their construction private.

This is what I was trying, and evidently failing, to articulate in this comment. :slightly_smiling_face:

Even though internally JS engines may collect strings, they do this transparently. So from a specification perspective all the different implementations of js-strings across the many JS engines behave the same.

If a program did want object retention semantics of strings for a particular part of their code then one option, depending on the exact use-case, would be to use the object form of strings:

class StringStore {
  #map = new Map();
  #fr = new FinalizationRegistry(held => {
    if (this.#map.get(held)?.deref() === void 0) { // ensure no race with `get`
      this.#map.delete(held);
    }
  });

  /**
   * get a String object whose string value is {s} 
   * if a matching object has already been created
   * then will return that.
   */
  get(s) {
    s = String(s);
    let obj;
    if (this.#map.has(s)) {
      obj = this.#map.get(s).deref();
    }
    if (!obj) {
      obj = new String(s);
      Object.freeze(obj);
      this.#fr.register(obj, s);
      this.#map.set(s, new WeakRef(obj));
    }
    return obj;
  }

  has(s) {
    s = String(s);
    return this.#map.has(s);
  }
}
1 Like

Though, the only reason to use the object form of a string here is to give a particular string a unique identity. You could just as well wrap the string in a single property object and use that instead, if you're able to do so.

const ref = { content: 'abc' }
const map = new WeakMap()
weakMap.set(ref, 'someValue')
...

Semantically, only objects in JS are collectible, which is why that’s the only things you can weakly hold. Weakly holding something doesn’t imbue collectibility, and allowing primitives (symbols or otherwise) wouldn’t make them collectible.

Internal representations are irrelevant here.

2 Likes

Do get toString for free when using String objects though.

let a = new String("world");
let b = "hello " + a; // “hello world”
1 Like