Built-in function memoization with weak references

Function memoization is a useful feature in writing performant applications, but there are multiple common implementations that each differ in their usefulness and shortcomings.

Lodash uses only the first argument by default, which can cause collisions in retrieving the correct result. I've seen one implementation that used JSON.stringify on the arguments object which can fail if any argument isn't stringifiable, and stringifying can be as slow as whatever the function was that was being memoized. Recently, I used an in-house memoization function that relied on WeakMap to allow the arguments to be garbage collected, but nesting these calls accidentally created a strong reference, nullifying the benefit. The WeakMap feature also meant that it couldn't memoize primitive arguments since these can't be keys in a WeakMap.

weakMemo(thing1 => 
  weakMemo(thing2 => 
    // thing1 is strongly referenced here
    doStuff(thing1, thing2)
  )
)

I'd like to propose a language-level memoization feature that correctly weakly references non-primitives, and compares the entire argument list. I think this comparison could be a plain strict equality (===) check or it could be the result of a resolver function provided to the memoization call like what Lodash allows. (https://lodash.com/docs/#memoize).

Since this would be implemented at the JS engine level, implementations could differ on how/when/if the function gets memoized or its lookup mechanism gets cleared. In space-constrained environments like IOT devices, the call to memoize a function could be ignored. In environments where space is at first plentiful, but gets filled up, the call to memoize a function could work as expected while space is available, then stop using the memoization mechanism as space is constrained. This would effectively be a JS engine hint from the programmer that some function is time-intensive, and the engine can/should use a memoization strategy (but not at the expense of keeping strong references).

Proposed syntax:

Function.memoize(myFunc);
2 Likes

What about primitives? Those necessarily must use strong references. Also, keep in mind, any sane implementation would only weakly reference objects and symbols where possible, regardless of what the spec says, as it saves a ton on memory overhead.

Primitives wouldn't be weakly referenced, but they also do not have as much potential for memory leaks as objects do. Here's what I envision:

// declare function
function myFunc(object, array, string, boolean, symbol) {
  /* expensive computations */
}

// memoize
Function.memoize(myFunc);

const context = [
  {}, [], 'hello world', Symbol()
];

// initial invocation
myFunc(...context);

// memoized invocation
myFunc(...context);

// remove the object parameter
delete context[0];

// when GC (garbage collection) runs next, 
// it will remove the object from memory,
// ignoring the reference in the memoized myFunc's cache

In the above example, there is no way to call myFunc again with the arguments that it has memoized. What this proposal is mostly concerned about is ensuring that the memoization mechanism doesn't have a memory leak. The only implementations I've seen that try to address that fail, and I'm not sure if there is a way to write a memoization method in JS that doesn't leak memory. Because of this, I'm convinced that the only way this will happen is with a built-in mechanism that works at a lower level than JS.

Strings. There's your main source of memory leaks with primitives. You call it with a string of file contents, that right there is an easy way to ramp up memory usage really quick.

Also, Function.memoize should not be mutating functions. Engines rely on the fact functions are never modifiable after creation to simplify their optimization pipeline tremendously - it also allows them to do things like optimize them off-thread while they're being used, and just phase back into it once it is.

That's a great point (about not mutating functions). Perhaps, this would be better:

const memoizedFunc = Function.memoize(myFunc);

Or we could introduce a syntax that would signify the function should be memoized at the time it's declared/created:

function memoizeMe #(a, b, c) { /*...*/ }

After further consideration, I dislike the syntax route. We don't always know at the time a function is written if it should be memoized. For example:

// All good
const fastFn = createFn(
  these,
  arguments,
  make,
  a,
  FAST,
  func
);

// Uh-oh
const slowFn = createFn(
  these,
  arguments,
  make,
  a,
  SLOW,
  func
);

// All better now
const alsoFastFn = Function.memoize(slowFn);

I like that this doesn't introduce new syntax and only adds to the Function namespace. For cases when the JavaScript engine decides not to memoize the function, the return value could be the same as the return value of Function.prototype.bind.call(fn, null).

1 Like

I think we'll need the hashing function as the optional second parameter. This would handle the case where deeply equal but not referentially equal object arguments give identical return values. The signature would be:

Syntax

Function.memoize(fn [, hashingFn])

Parameters

fn

The function to be memoized.

hashingFn

A function to generate a string that will be used as a key for the memoization cache. This function receives the arguments passed to the memoized function and must return a string. Any other return type is ignored and the arguments themselves are used as the cache key (as would happen if no hashing function were passed).

Return value

A copy of the original function that is either memoized or the equivalent of Function.prototype.bind.call(fn, null). Exactly which is returned is determined by the JavaScript engine. Note that the memoized function may be deoptimized at any time according to whatever heuristics the JavaScript engine uses to determine if space is constrained.

I honestly wouldn't mind a built-in function for this, provided the memoization can be configured and provided the cache itself is not enumerable (allowing objects to be referenced weakly). Just to name a couple:

  • React needs to memoize not by raw argument but by object key/value.
  • Long-running servers and in-memory network caches will want to put a time limit for how long values exist in cache, and some may also want a way to clear the cache entirely.

My biggest use case for this personally is related to the second: memoizing async actions correctly without having to use an intermediate function (either named or IIAFE) just to keep using async/await.

Conceptually, I could see an API like this work:

// Never evicted, caches based on arguments
const memoized = func.memoize()

const customMemoized = func.memoize({
  // Evict after this promise resolves - can be controlled by arguments
  // as well.
  evictAfter: (...args) =>
    new Promise(resolve => setTimeout(resolve, 5000)),

  // Get the arguments to use for caching - can return any iterable,
  // including arrays and generators.
  select: function *selector(...args) {
    // What React might do
    const [props, ...rest] = args
    yield* rest
    yield* Object.keys(props)
    yield* Object.values(props)
  },
})

// Defaults, but explicit.
const defaultMemoized = func.memoize({
  evictAfter: () => new Promise(() => {}), // never resolved
  select(...args) { return [this, ..args] },
})

Why I went with that:

  • Relying on promises, iterables, and raw arguments lists makes it fairly intuitive, and for simple cases like single parameters, also fairly easy.
  • evictAfter does not require any new host hooks to implement.
  • Having select be iterable-based makes it much more efficient for cases where the key path is much more complicated. Plus, each yield maps very directly to an underlying cache access, so it's also fairly low-level.
  • Passing arguments to evictAfter allows much more complicated logic like:
    • Invalidating it on web socket message, if you use an intermediate key → resolve map
    • Allowing callers to have control over cache times
    • Tracking a list of active cache entries via an intermediate resolve set and simply invoking them all
    • Accepting weak references and memoizing based on that
    • Memoizing based on a subset of parameters

Here's a rough polyfill sketch of that (minus the needed weak references for objects):

Function.prototype.memoize = function (opts) {
  const func = this

  if (typeof func !== "function") {
    throw new TypeError('this must be callable')
  }

  let select, evictAfter
  if (opts != null) {
    select = opts.select
    if (select != null && typeof select !== 'function') {
      throw new TypeError('opts.select must be callable if present')
    }

    evictAfter = opts.evictAfter
    if (evictAfter != null && typeof evictAfter !== 'function') {
      throw new TypeError('opts.evictAfter must be callable if present')
    }
  }

  const hole = {}
  function create(parent, key) {
    return {parent, key, map: new Map(), value: hole}
  }

  const rootCache = create(undefined, undefined)

  function get(parent, key) {
    let child = parent.map.get(key)
    if (child == null) {
      child = create(parent, key)
      parent.map.set(key, child)
    }
    return child
  }

  return function () {
    const cachePath = select != null
      ? [...select.apply(this, arguments)]
      : [this, ...arguments]

    let cache = rootCache
    for (const key of cachePath) {
      cache = get(cache, key)
    }

    if (cache.value !== hole) {
      return cache.value
    }

    // Executed before the actual function so things like timeouts are
    // scheduled accurately
    if (evict != null) {
      (async () => {
        await evict.apply(this, arguments)
        let current = cache
        current.value = hole
        // Collect garbage
        while (current !== rootCache && !current.map.size) {
          current.parent.map.delete(current.key)
          current = current.parent
        }
      })()
    }

    return cache.value = func.apply(this, arguments)
  }
}

I think that the memoization API would benefit from being as simple as possible, even if it means not covering every use case. Having a select function as a way to get the cache keys could cause problems with the weak referencing I hope to achieve. The author could accidentally keep a reference to the arguments outside of the function. This is also a potential problem with the hashing function I proposed :thinking:.

The React use case (memoizing based on object keys and values) could be done without a select function:

// the first half of the array is the keys, the second is the values
// zip them into entries, then convert to an object
const keysAndValuesToObj = (...kv) => Object.fromEntries(
  _.zip(
    kv.slice(0, kv.length / 2),
    kv.slice(kv.length / 2, kv.length)
  )
);

React.memo = fn => {
  // spread out the keys and values for the memoized function,
  // then gather them back into an object for the React function call
  const _fn = Function.memoize((...kv) => fn(keysAndValuesToObj(kv)));

  // now the function will be memoized based on keys and values
  return props => _fn(...Object.keys(props), ...Object.values(props));
}

I like the idea of cache-busting, but it should also be a hint to the JS engine that the cache is no longer needed. The whole caching/memoizing process should be transparent to the JS author. The author should have no access to the cache directly. I originally considered something like this:

const memoFn = Function.memoize(fn);
// ...
memoFn.clearCache();

This has the unfortunate knock-on effect of checking if clearCache exists on any given function (and is callable) before invoking. I think an API similar to setTimeout and clearTimeout would be more robust:

const memoFn = Function.memoize(fn);
//...
Function.dememoize(memoFn);

This is true, and I briefly considered it, but that path is very perf-critical, and yes, even so much as allocating objects and spreading them can have a significant performance impact. (This is coming from someone who maintains a competing virtual DOM framework that's structurally very similar.) If you really want a perf comparison, here's my above polyfill benchmarked with my example vs yours. And here's the ranges I got from that, in order of performance:

  1. optimized polyfill + select: 2,520,046 to 2,686,660 ops/sec
  2. slow polyfill + select: 1,281,794 to 1,343,756 ops/sec
  3. optimized polyfill + spread: 1,042,992 to 1,111,727 ops/sec
  4. *simple variant + spread: 990,946 to 1,059,852 ops/sec
  5. slow polyfill + spread: 916,751 to 980,107 ops/sec

* = yours (rest are all mine for show)

This kind of perf boost (native implementations would exaggerate this cliff, BTW) makes the spread variant a lot less attractive.

I intentionally avoided that as it's an area of non-determinacy, something I'd like to avoid and something I know TC39 members aren't fond of, either.

Also, things like network cache expiry is something you don't want to just leave it at the whim of the engine to decide.

FYI, the chances of that working are slim for the same reason weakMap.clear() was left out as being redundant with weakMap = new WeakMap().

We may be overthinking this on the cache-busting. Part of the purpose of this proposal/idea was to let the JS engine determine if trading time for space was worth it. The author shouldn't have to worry about cache management whatsoever. I say we get rid of any references to evictCache, clearCache, and dememoize.

I'm coming around to the idea of a selector function, in place of the hashing function I listed in an earlier post. I like the idea of the author picking the key(s) to cache by because they'll know if two different (meaning referentially unequal) sets of arguments will produce the same result. I'm not sure if returning an iterator is a great idea. Here's an example of how this could go bad:

const arr1 = [0, 1, 2];
arr1.meta = 'created on 01-01-21';

const arr2 = [0, 1, 2];
arr2.meta = 'created on 02-02-21';

const getMetaData = Function.memoize(
  (array) => array.meta, // the function to memoize
  (first) => first, // only key off of the first argument, ignore any extras
);

// first run, unmemoized 👍
getMetaData(arr1); // -> 'created on 01-01-21'

// second run, memoized 👍
getMetaData(arr1); // -> 'created on 01-01-21'

// new argument, but shallowly equal 👎
getMetaData(arr2); // -> 'created on 01-01-21'

I ran the perf comparison in Firefox v81 and the spread version was faster there. The select version was faster in Chrome v85.

Firefox v81

image

Chrome v85

image

I wrote up a polyfill of Function.memoize using WeakRefs (https://caniuse.com/?search=weakref) and a cache trie. The main point of this implementation is not the speed of memoization or retrieval of memoized values, but eliminating memory leaks where memoization caches hold on to otherwise unreachable objects, preventing them from being garbage collected.

I didn't do anything special with the hash function (like interpreting iterable return values as separate keys as @isiahmeadows discussed) but that is something that could be further explored.

After implementing this, I realized that the consumer could also pass in a comparison function (currently this is done with === in the get() function) but that may be a footgun since doing deep or complex comparisons on so many values may be slower than the computation that was memoized away.

I also had difficulty in writing tests to determine if the cache had been modified by pruneTree() after a value had been deleted, so I relied on breakpoints in the browser DevTools to confirm that it worked as expected.

// Polyfill
Function.memoize = (function() {
  const NO_VALUE = Symbol("NO_VALUE");
  const VALUE_AT = Symbol("VALUE_AT");
  const registry = new FinalizationRegistry(function schedulePruning(
    cacheInstance
  ) {
    // An object key in the cacheInstance has been Garbage Collected.
    // Schedule a cleanup of the cache at the next idle period.
    window.requestIdleCallback(() => pruneTree(cacheInstance));
  });

  function isWeakRef(wr) {
    return Object.prototype.toString.call(wr).slice(8, -1) === "WeakRef";
  }

  function createCacheNode() {
    const cache = new Map();
    cache.set(VALUE_AT, NO_VALUE);
    return cache;
  }

  // remove all branches that are keyed off of empty (GC'd) WeakRefs
  function pruneTree(cache) {
    for (const [k, v] of cache.entries()) {
      if (isWeakRef(v) && !v.deref()) {
        cache.delete(k);
      } else if (k !== VALUE_AT) {
        pruneTree(v);
      }
    }
  }

  // essentially Map.get where cache is a map with some WeakRef keys that
  // need to be dereferenced to be checked, and the value is a cache node
  function get(cache, key) {
    for (const [k, v] of cache.entries()) {
      const _key = isWeakRef(k) ? k.deref() : k;
      if (_key === key) {
        return v === NO_VALUE ? null : v;
      }
    }
    return null;
  }

  function findValueByKeys(cache, ...keys) {
    let currentCache = cache;
    while (keys.length) {
      currentCache = get(currentCache, keys.shift());
      if (currentCache === null) {
        return NO_VALUE;
      }
    }
    return currentCache.get(VALUE_AT);
  }

  function setValueAt(cache, value, ...keys) {
    let currentCache = cache;

    // traverse to the leaf node
    while (keys.length) {
      let key = keys.shift();
      let nextCache = get(currentCache, key);

      if (nextCache === null) {
        nextCache = createCacheNode();
        if (key !== null && typeof key === "object") {
          registry.register(key, currentCache);
          key = new WeakRef(key);
        }
        currentCache.set(key, nextCache);
      }

      currentCache = nextCache;
    }

    currentCache.set(VALUE_AT, value);
  }

  return function(fn, hashFn) {
    const rootCacheNode = createCacheNode();
    const shouldHash = typeof hashFn === "function";

    return function() {
      let args = shouldHash ? hashFn(...arguments) : arguments;
      let val = findValueByKeys(rootCacheNode, ...args);
      if (val === NO_VALUE) {
        val = fn.apply(null, arguments);
        setValueAt(rootCacheNode, val, ...args);
      }
      return val;
    };
  };
})();

/**********************************/
/*             Tests              */
/**********************************/

const ctx = { maybeExists: { val: "foo" }, alwaysExists: { val: "bar" } };
let unmemoizedRunCount = 0;
function combineValues(a, b) {
  unmemoizedRunCount++;
  return (a && a.val) + (b && b.val);
}
const memoCombo = Function.memoize((a, b) => combineValues(a, b));

memoCombo(ctx.maybeExists, ctx.alwaysExists);
console.assert(
  unmemoizedRunCount === 1,
  "combineValues should have run unmemoized"
);

memoCombo(ctx.maybeExists, ctx.alwaysExists);
console.assert(
  unmemoizedRunCount === 1,
  "combineValues should have not have run"
);

memoCombo(ctx.maybeExists, ctx.alwaysExists);
console.assert(
  unmemoizedRunCount === 1,
  "combineValues should have not have run"
);

delete ctx.maybeExists;

memoCombo(ctx.maybeExists, ctx.alwaysExists);
console.assert(
  unmemoizedRunCount === 2,
  "combineValues should have run unmemoized"
);

EDIT: Better polyfill

Function.memoize = (function() {
  const NO_VALUE = Symbol("NO_VALUE");
  const registry = new FinalizationRegistry(function schedulePruning(
    cacheInstance
  ) {
    // An object key in the cacheInstance has been Garbage Collected.
    // Schedule a cleanup of the cache at the next idle period.
    window.requestIdleCallback(() => pruneTree(cacheInstance));
  });

  function isWeakRef(wr) {
    return Object.prototype.toString.call(wr).slice(8, -1) === "WeakRef";
  }

  function createNode() {
    const node = {
      cache: new Map(),
      value: NO_VALUE,
    }
    return node;
  }

  // remove all branches that are keyed off of empty (GC'd) WeakRefs
  function pruneTree(cache) {
    for (const [k, v] of cache.entries()) {
      if (isWeakRef(k) && !k.deref()) {
        cache.delete(k);
      } else {
        pruneTree(v.cache);
      }
    }
  }

  // essentially Map.get where cache is a map with some WeakRef keys that
  // need to be dereferenced to be checked, and the value is a cache node
  function getNode(node, key, comparisonFn) {
    for (const [k, nextNode] of node.cache.entries()) {
      const _key = isWeakRef(k) ? k.deref() : k;
      if (comparisonFn(_key, key)) {
        return nextNode;
      }
    }
    return null;
  }

  function findValueByKeys(node, comparisonFn, ...keys) {
    let currentNode = node;
    while (keys.length) {
      currentNode = getNode(currentNode, keys.shift(), comparisonFn);
      if (currentNode === null) {
        return NO_VALUE;
      }
    }
    return currentNode.value;
  }

  function setValueAt(node, comparisonFn, value, ...keys) {
    let currentNode = node;

    // traverse to the leaf node
    while (keys.length) {
      let key = keys.shift();
      let nextNode = getNode(currentNode, key, comparisonFn);

      if (nextNode === null) {
        nextNode = createNode();
        if (key !== null && typeof key === "object") {
          registry.register(key, currentNode.cache);
          key = new WeakRef(key);
        }
        currentNode.cache.set(key, nextNode);
      }

      currentNode = nextNode;
    }

    currentNode.value = value;
  }

  function defaultCompare(a, b) {
    return a === b;
  }

  return function(fn, hashFn, comparisonFn) {
    const rootCacheNode = createNode();
    hashFn = typeof hashFn === "function" ? hashFn : null;
    comparisonFn = typeof comparisonFn === 'function' ? comparisonFn : defaultCompare;

    return function() {
      let args = hashFn ? hashFn(...arguments) : arguments;
      let val = findValueByKeys(rootCacheNode, comparisonFn, ...args);
      if (val === NO_VALUE) {
        val = fn.apply(null, arguments);
        setValueAt(rootCacheNode, comparisonFn, val, ...args);
      }
      return val;
    };
  };
})();

/**********************************/
/*             Tests              */
/**********************************/

const ctx = { maybeExists: { val: "foo" }, alwaysExists: { val: "bar" } };
let unmemoizedRunCount = 0;
function combineValues(a, b) {
  unmemoizedRunCount++;
  return (a && a.val) + (b && b.val);
}
const memoCombo = Function.memoize((a, b) => combineValues(a, b));

memoCombo(ctx.maybeExists, ctx.alwaysExists);
console.assert(
  unmemoizedRunCount === 1,
  "combineValues should have run unmemoized"
);

memoCombo(ctx.maybeExists, ctx.alwaysExists);
console.assert(
  unmemoizedRunCount === 1,
  "combineValues should have not have run"
);

memoCombo(ctx.maybeExists, ctx.alwaysExists);
console.assert(
  unmemoizedRunCount === 1,
  "combineValues should have not have run"
);

delete ctx.maybeExists;

memoCombo(ctx.maybeExists, ctx.alwaysExists);
console.assert(
  unmemoizedRunCount === 2,
  "combineValues should have run unmemoized"
);