Long Generator Delegation Chain Optimization - O(N^2)

I did some benchmarking work 2 years ago when evaluating JavaScript patterns and libraries for Structured Concurrency (SC). Though the specifics of SC is not relevant to this thread, the important part to understand is that in order to write ergonomic async code with SC properties, it’s better to have language or runtime level support for task cancellation, which is something long overdue for TC39 almost a decade ago. So it’s impossible to achive SC + implicit cancellation and clenup, using merely async/await and Promise primitives alone. Which is why we are now seeing more libraries doing their own task scheduling runtime on top of Generators. For a few examples: Effect (formerly effect-ts), Effection, ember-concurrency, co.js.

And there are projects heavily relying on Generators in order to achieve complex execution control, like doing Algebraic Effects. Notably engine262 is one of those examples.

To be honest, the ergonomics to wrtie complex concurrenct code using Generator based SC patterns felt so good that they are now my default choice for prototying. However, one decisive factor of Generator made me conclude that they are not ready for use in production, where our code runs on millions of devices on tens of different platforms, including embedded devices with tight CPU and memory budget.

That factor is of course the performance. Specifically - Generator delegation using yield* syntax. As the depth of these delegation stack grows, the time to call next() on the top of chain would grow at O(N) rate. This is because by the current spec of ECMA262, every generator on that chain must be sequentially woken up but only to call the next() function of the next generator it delegates to, and also the value yield by the inner most generator has to travel all the way up the chain, waking up every generator again, just to pass the value back to the outer iterator. As a result, a seeming O(N) iteration of a generator may actually incur O(N^2) performance cost.

This observation is not only theoretical. I actually went to the length of rewriting our code using Generators and yield*, and measured that they performed much worse than before.

To demonstrate this, I also wrote a simple benchmark for the primitives. The graph below shows the time increase related to the depth of the delegation stack.

As we can see, async/await as a baseline has the best performance of them all, because the JS engine is able to directly resume the top of the stack frame. yield* delegation chain performs way worse. It’s interesting that if I transpile the all the ES6 function* and yield* syntax using Regenerator, and run them using the Regenerator runtime, then it actually would perform better!

Python language designers realized this problem when they introduced Generator delegation, and implemented an optimization: PEP 380 – Syntax for Delegating to a Subgenerator | peps.python.org

A possible strategy is to add a slot to generator objects to hold a generator being delegated to. When a __next__() or send() call is made on the generator, this slot is checked first, and if it is nonempty, the generator that it references is resumed instead. If it raises StopIteration, the slot is cleared and the main generator is resumed.

This would reduce the delegation overhead to a chain of C function calls involving no Python code execution.

If JavaScript can have a similar optimization defined in spec for Generator (both Generator Function and user-defined Generator), it would make this pattern become more production ready for performance demanding use cases. One possibility is to add a [Symbol.delegateIterator]: Iterator property to the Iterator class, and native Generator Functions can automatically set this property on itself when it delegates to another iterator / generator using yield*. User-defined generators can also benefit from the optimization by adopting it.

Would you be able to help me understand why this optimisation can't be applied without a change to the specification? Could storing the "delegateIterator" not be something done internally by the engine without being a public property.

Thanks!

Engines can apply this optimization to native Generator Function when evaluating yield*. But user-defined generators and iterators won't be able to opt-in with this optimization.

Also there are different JavaScript runtimes. It's impractical to coordinate the optimization across them without going through the standard spec.

The current yield* guarantees:

  1. Each intermediate generator in the delegation is resumed, executes its code, and returns control back up the chain.
  2. All try…finally blocks within delegating generators are executed strictly according to the call stack.
  3. The .next, .throw, and .return methods of generators can be overridden, and yield* correctly calls them, also respecting Proxies and other dynamic wrappers.

To avoid breaking backward compatibility and introducing non-obvious constraints, the solution should not rely on the Iterator Protocol. For example, it could be a new delegate operator.

1 Like

I don’t immediately see a concrete example how it can break existing behavior. Maybe we can try building a PoC implementation with engine262 and test with it then.

I want this optimization too, but I also see the problem. I think it will be possible for generators to optimize this, but if you hand-implement the protocol it seems to me like the allowing direct invocation of a method deep inside the generator call stack would skip all the layers of error handling in the wrapping functions.

Give me a bit and I'll make a code example.

Error handling and finalization only happens after the delegate completes. As long as you resume the parent generator when the delegate completes, and passing the completion record to the parent, you shouldn't miss any error handling and finialization.

You just couldn't give me a bit to prove to you that I know what I'm talking about?

function* outer(values) {
  try { yield* values } catch (e) {
    console.log('outer error');
    throw e;
  }
}

function* inner(values) {
  try { yield* values } catch (e) {
    console.log('inner error');
    throw e;
  }
}

function *values() {
  yield 1;
  yield 2;
  throw new Error();
}

for (let value of outer(inner(values())))
  console.log(value);

// 1
// 2
// inner error
// outer error

Now with what you're proposing, because the outer is delegating to inner, a reference to the inner generator would be exposed directly to the outside world so that the outside world could produce values from inner without needing to go through N layers of nested calls to next(). But the result is that the if this was implemented the output would now change to:

// 1
// 2
// inner error

Because you cut the outer next() from the call stack, you also removed its try/catch from the chain of error handlers, disrupting structured exception handling and making this a breaking change rather than a pure perf optimization.

So that's the bad news. I also cannot see any way that it could ever be safe to do this optimization while allowing a mix of native generator functions and hand-implemented iterators.

The good news is that if all you have are a stack of unmodified native generators delegating to each other, it seems to me like the engine should be able to optimize that. I would think actually the hardest part would be proving that nobody has done anything sneaky like override .next anywhere along the chain, especially since the overriding of next could occur between calls to next.

I don’t think that’s what rix is proposing. Your example’s semantics would be exactly the same. The optimization that rix is talking about is that calling the outer.next() would directly invoke values.next() without first waking up inner. Once values completes (which includes when it throws an error), inner is resumed up. And once inner completes, outer is resumed.

For native generators and yield *, this is pure optimization and not a change in semantics.

I think the confusion comes from this suggestion. It’s not possible to maintain our expected semantics without exposing all N delegates in a chain. When the Nth ends normally, we feed its value to delegates[N-1].next(value). If it throws an error, then we call .throw(error).

By only exposing a single delegate, there’s no way for us to know where to resume.

Or, we probably shouldn’t expose this on the iterator protocol at all. Keep this for native syntax, which is widely supported everywhere and which guarantees the correct behavior. All iterators could be code-able as a native generator, right?

Nope. I don't remember the specifics out the top of my head but there are some cases that are impossible, and a bunch of cases that are completely unergonomic with generators.

One case I do remember is access to the first next value, which is why there is a proposal to add syntax like function.sent.

Ah yeah, you're right that inner error would also be skipped, I wasn't thinking about that.