syncAndAsyncIterator

The goal of this proposal is to better facilitate the use of for loop syntax when working with streams of character data, which when expressed as a purely async iterables tend to have poor performance given that most of the time the next character in the iterable is available synchronously, while only occasionally does another chunk of data need to be read asynchronously, e.g. from disk. Forcing these synchronous operations to be asynchronous incurs unnecessary wall clock time cost from calls to queueMicrotask (or setImmediate or whatever). In fact that cost is incurred twice: once to await step, and once to await step.value.

The proposal is to create a new type of iterator and a new syntax for consuming it.

The new type is a syncAndAsync iterator. The interface for it would be:

{
  next<T>(): { done: boolean, value: T } | Promise<{done: boolean, value:T }>;
}

The strawman expects an iterable which can produce such an iterator to be returned from a Symbol.for('syncAndAsyncIterator'), though ultimately having Symbol.syncAndAsyncIterator as a WKS would seem most apt.

Iteration over a syncAndAsync iterator is facilitated with an extension of the for loop syntax:

for await? (const value of syncAndAsyncIteratable) {
}

for await? is expressed in the AST as { async: true, sync: true } where for await would be { async: true, sync: false } and a for would be { async: false, sync: true }.

Creation of a syncAndAsync iterator could be done with async generator functions. It is my belief that it would not be necessary to define any new syntax for generator functions as their internal implementation could be changed to to yield iterator steps synchronously when no awaiting had been necessary since the previous call to yield. The current API would continue to be served by implementing [Symbol.asyncIterator]() as thin wrapper around [Symbol.syncAndAsyncIterator]().

I've implemented the smallest useful subset of the implied changes to babel here.

To be addressed:

How would promise be reflectively identified? https://tc39.es/ecma262/#sec-ispromise seems the obvious answer.

What would be the fallback order of iterators for for await?? I think syncAndAsync => async => sync would be appropriate.

Would return(value) and throw(exception) have similar sync/async duality? I see no perf need to allow them to be synchronous, and there are significant benefits to there being only one pathway.

Any concerns around yield*?

I am not convinced this warrants new syntax. Getting maximum performance does sometimes mean you will have to avoid using convenience features, at least until engines optimize it. The iteration protocol inherently involves a bunch of object allocations, so if you really want maximum performance you'll have to forgo it entirely anyway.

That said, a couple of comments:

Forcing these synchronous operations to be asynchronous incurs unnecessary wall clock time cost from calls to queueMicrotask (or setImmediate or whatever)

I don't understand this. Why are you calling those methods? You can resolve a Promise synchronously. Code which is awaiting (or .thening) that Promise won't run until the next microtask tick, but that's very inexpensive, and engines are getting better at optimizing those out entirely.

Anyway, here's what I'd do for your use case:

// this is just a mockup; you can imagine any async function which returns chunks.
let getReader = () => {
  let chunks = [[0, 1, 2], [3, 4, 5], [6, 7, 8]];
  let i = 0;
  return async () => {
    if (i >= chunks.length) {
      return 'EOF';
    }
    return chunks[i++];
  }
};


// This is the actual core library code. Obviously you could make some design
// decisions differently here, but I hope it's suggestive enough.
// Yes, it's a little awkward to write code like this, but hopefully you'd only
// need to when writing a library and trying to squeeze out as much performance
// as possible; the difference in performance between this and the purely async
// version is unlikely to matter much for most people.
let iterable = {
  [Symbol.iterator]() {
    let pending = false;
    let reader = getReader();
    let currentChunk = [];
    return {
      next() {
        if (pending) {
          throw new Error('called again before waiting for previous result');
        }
        if (currentChunk === 'EOF') {
          return { done: true };
        }
        if (currentChunk.length === 0) {
          pending = true;
          let promise = (async () => {
            currentChunk = await reader();
            pending = false;
            if (currentChunk === 'EOF') {
              return { done: true };
            } else {
              return { done: false, value: currentChunk.shift() };
            }
          })();
          return { done: false, value: { async: true, promise } };
        } else {
          return { done: false, value: { async: false, value: currentChunk.shift() } };
        }
      }
    };
  }
};


// Here's how you'd use it. Note that there's no `await` at all in cases where the value is available synchronously.
async function main() {
  for (let item of iterable) {
    if (item.async) {
      item = await item.promise;
    }
    if (item.done) {
      break;
    }
    console.log(item.value);
  }
}
main();

Edit: of course in practice what I'd do is just yield the entire chunk, and expect the consumer to write an inner loop to walk that chunk. But the above works if you for some reason that's not viable.

2 Likes

The iteration protocol inherently involves a bunch of object allocations, so if you really want maximum performance you'll have to forgo it entirely anyway.

If object allocations were terribly costly the iteration protocol wouldn't have been designed around them. Generational garbage collection is designed with the basic assumption that most small objects are short lived. Here's some reading on the topic: https://v8.dev/blog/trash-talk

You can resolve a Promise synchronously. Code which is await ing (or .then ing) that Promise won't run until the next microtask tick, but that's very inexpensive, and engines are getting better at optimizing those out entirely.

Six of one, half dozen of the other. My point was that you are waiting for the next microtask tick. If we are talking about inputs of any significant size all that time really adds up. I've got to get a number on this, but hearsay leads me to believe it might be on the order of 10x speed difference. The engine will not optimize that away, as that is the behavior the spec dictates.

Here, by the way, is how I came to write this up: https://github.com/nodejs/node/issues/31979
The result of that discussion seemed to be that there was no chance the engine would optimize that cost away.

Another thing: the problem with the code you've shared is that sync iteration expects it to be possible to know whether you're done synchronously. Your own code expects that, but it's an unsound expectation. If you remedy that problem, I think you'll discover that you're just left with async iterators.

If object allocations were terribly costly the iteration protocol wouldn't have been designed around them.

They're not terribly costly, they're just not free. If you want every possible bit of performance you do generally have to avoid them. But you probably don't want every possible bit of performance if it makes your code too awkward to write or consume.

The engine will not optimize that away, as that is the behavior the spec dictates.

The spec dictates that code be run in a particular order. It doesn't dictate that it take any particular amount of time to do so; a "microtask turn" is not a unit of time. Engines can make microtask turns have low or zero overhead in many cases.

Another thing: the problem with the code you've shared is that sync iteration expects it to be possible to know whether you're done synchronously. Your own code expects that, but it's an unsound expectation.

My code works fine. You can run it. Yes, it has a stricter contract for consumers than most iterators do - specifically, it requires that consumers await the value in the case that it is not synchronously available before calling .next again - but that's fine; it's permissible for code to have stricter contracts than are enforced by the language.

Apologies, yes, I see now that your code does work. I did not look as closely as I should have because it was clear to me at a glance that that code is not the solution to the problem I have. My goal is to create the most comprehensive open source library for working with iterables and async iterables, so a solution that supports only a subset of the iterable contract is not one that I could endorse. If I was solving a specific problem instead of considering a general class of solutions I would probably, as you say, write an async iterable of sync iterables (in essence a stream).

That said while sync-of-async may be the best current option in its performance class, I don't think it's especially good. It makes central an implementation detail (chunking) that most code would be willing to pay a small perf price not to need to know about. Code written with nested loops has other ergonomic problems too, like the difficulty of breaking out of both loops and the presence of edge cases at the chunk boundaries that may create subtle bugs, especially since the kind of content developers tend to test with (in the file system at least) is not larger than one chunk.

I am more intrigued by your point about microtasks being the subject of further possible optimization. Obviously it would be better if this were an implementation/optimization detail that could be hidden completely, but I am not yet convinced that that is the case. I read through a recent blog post about what optimizations are being done already, but these are almost certain to have been in place already for the code which was seeing a 10x slowdown. The problem is that each for await loop will always create two microtasks per value, and any meaningful work is going to be the result of nesting multiple operations, e.g. perhaps a split followed by a map, a filter, and a join. In this charitably simple setup we have 8 microtasks per character of text input, and it isn't hard to imagine that number being quite a bit higher.

Thus any solution to this problem would need to be aggressive in order to combat such a high amount of overhead. The only optimization that in my mind fits the bill would be not creating microtasks. I would naively assume that it's safe to synchronously continue execution without mucking about in microtasks at all if I was awaiting the value of a settled promise and the microtask queue was empty, but it seems telling to me that a) the spec doesn't say that and b) the engines haven't done it. But of course I could be wrong. I'm dipping my toes in a very deep pool here.

Just found another person with similar experiences: https://medium.com/netscape/async-iterators-these-promises-are-killing-my-performance-4767df03d85b

…and that the call stack is empty or consists of do-nothing frames. The code that queued the initial promise job (microtask) still needs to run to completion. But yes, I'd share your naive assumption that this should be the case most of the time on async iterables that yielded synchronously.

The only problem that cannot be solved by optimisations is running multiple asynchronous (microtask-only) generators at once. (Microtask-only meaning they immediately fulfill all promises, not using promises that get resolved in macro tasks). Something like

const range = Array.from({length: 100}, (_, i) => i);
(async () => { for await (const i of range) console.log('a', i); })();
(async () => { for await (const i of range) console.log('b', i); })();

will have to jump back and forth between the two asynchronous executions instead of running straight through each loop. This is a design decision far older than this thread, reaching back to when the promise job queue was introduced in the spec. Unlike promises/A+, which gives total freedom for execution order, consistency between engines was valued higher than optimisation potential and a deterministic scheduling algorithm was introduced. It features this "fair scheduling" between two microtask-constrained processes, at the cost of performance.

The spec doesn't need to point out performance optimisations, it's the job of implementers to design them. Just consider how different the actual implementations of scopes and prototype property lookups are from what the spec describes. The spec details an "as if" behaviour. It certainly allows skipping microtask ticks if the result is not observable.
Also, even if the spec did this, it's no guarantee that the optimisation is actually implemented, see the mess with tail call elimination.

I think that should also be the approach to solve the general problem. Instead of introducing a third iteration protocol, I'd rather extend the existing ones to allow an asynchronous iterator's next() method return (a promise for) a synchronous iterable, and have this automatically consumed by a for await loop. While only affecting manually implemented iterators at first, later the generator yield* semantics could be changed to use that as well.

I wonder what the best way is to find someone involved in engine development would be who could comment on whether that optimization is one being considered?

I'd rather extend the existing ones to allow an asynchronous iterator's next() method return (a promise for) a synchronous iterable, and have this automatically consumed by a for await loop

Wouldn't how a for await loop treated that situation would be a very breaking change to the language? At very least you'd still have to extend the iterator spec to permit, say, { done, iterableValue } instead of { done, value }, which is still technically not non-breaking as a custom iterator could already have stored something in that field (I admit it's unlikely). Also what happens then when custom code not written using for..of encounters an iterator that uses iterableValue not value? It breaks.

Therefore I still think that the solution I am proposing has attractive properties.

  • It is 100% nonbreaking. The syntax is otherwise an error, and you have to opt into using the new protocol. Nobody should be accidentally opting in because well known symbols should only be added by the spec. Everyone else can use Symbol.for.
  • By not creating step shapes other than { done, value } it protects from deopts in hot pathways that deal with iterators.
  • It isn't subject to the bouncing problem you show with two async generators
  • When implemented correctly there is full interoperability. A consumer that doesn't understand the new protocol can seamlessly fall back to the old one.

I have thousands of hours into developing iterator tooling at this point. I chose my proposal quite carefully :smile:

Just want to ask: what does this solve that https://tc39.es/ecma262/#sec-async-from-sync-iterator-objects doesn't?

Well that is only useful if what you really have is a sync iterator. I am talking in essence about offering a performant and user-friendly abstraction over streams, which are not fundamentally synchronous. Of course it is possible to do blocking IO from a sync iterable, but then you have all the problems associated with blocking IO.

Let me clarify: those objects are created to be immediately used with for await loops. You can still use non-blocking I/O in those loops.

As for streams, I've been working on https://github.com/isiahmeadows/proposal-es-streams and am seeking early feedback before seeking a champion.

Interesting proposal on streams. Your'e using the iterator protocol underneath, so I think your solution would have the same perf problem and benefit from the solution I'm proposing.

I just got a benchmark running on node v15.2.1. It shows a 264x speed up for sync iteration:

βœ“  for await..of values from one chunk x 8.39 ops/sec Β±0.84% (43 runs sampled)
βœ“  for..of values from one chunk x 2,222 ops/sec Β±1.49% (86 runs sampled)

Result: for..of values from one chunk is 26397.86% faster than for await..of values from one chunk.

You can see the full code I ran in this gist.

As you can see the operation you note, syncFromAsyncIterator is happening. I checked to see if that might be what was slow, and it was not.

I hope this makes it clear why I think it's worth effort to optimize this. The cost of of converting a sync step to an async one appears to be trivial, but the cost of the awaits that are spurious on average 99.998% of the time ((1 - (1/65535)) / 100) is not.