syncAndAsyncIterator

The goal of this proposal is to better facilitate the use of for loop syntax when working with streams of character data, which when expressed as a purely async iterables tend to have poor performance given that most of the time the next character in the iterable is available synchronously, while only occasionally does another chunk of data need to be read asynchronously, e.g. from disk. Forcing these synchronous operations to be asynchronous incurs unnecessary wall clock time cost from calls to queueMicrotask (or setImmediate or whatever). In fact that cost is incurred twice: once to await step, and once to await step.value.

The proposal is to create a new type of iterator and a new syntax for consuming it.

The new type is a syncAndAsync iterator. The interface for it would be:

{
  next<T>(): { done: boolean, value: T } | Promise<{done: boolean, value:T }>;
}

The strawman expects an iterable which can produce such an iterator to be returned from a Symbol.for('syncAndAsyncIterator'), though ultimately having Symbol.syncAndAsyncIterator as a WKS would seem most apt.

Iteration over a syncAndAsync iterator is facilitated with an extension of the for loop syntax:

for await? (const value of syncAndAsyncIteratable) {
}

for await? is expressed in the AST as { async: true, sync: true } where for await would be { async: true, sync: false } and a for would be { async: false, sync: true }.

Creation of a syncAndAsync iterator could be done with async generator functions. It is my belief that it would not be necessary to define any new syntax for generator functions as their internal implementation could be changed to to yield iterator steps synchronously when no awaiting had been necessary since the previous call to yield. The current API would continue to be served by implementing [Symbol.asyncIterator]() as thin wrapper around [Symbol.syncAndAsyncIterator]().

I've implemented the smallest useful subset of the implied changes to babel here.

To be addressed:

How would promise be reflectively identified? https://tc39.es/ecma262/#sec-ispromise seems the obvious answer.

What would be the fallback order of iterators for for await?? I think syncAndAsync => async => sync would be appropriate.

Would return(value) and throw(exception) have similar sync/async duality? I see no perf need to allow them to be synchronous, and there are significant benefits to there being only one pathway.

Any concerns around yield*?

I am not convinced this warrants new syntax. Getting maximum performance does sometimes mean you will have to avoid using convenience features, at least until engines optimize it. The iteration protocol inherently involves a bunch of object allocations, so if you really want maximum performance you'll have to forgo it entirely anyway.

That said, a couple of comments:

Forcing these synchronous operations to be asynchronous incurs unnecessary wall clock time cost from calls to queueMicrotask (or setImmediate or whatever)

I don't understand this. Why are you calling those methods? You can resolve a Promise synchronously. Code which is awaiting (or .thening) that Promise won't run until the next microtask tick, but that's very inexpensive, and engines are getting better at optimizing those out entirely.

Anyway, here's what I'd do for your use case:

// this is just a mockup; you can imagine any async function which returns chunks.
let getReader = () => {
  let chunks = [[0, 1, 2], [3, 4, 5], [6, 7, 8]];
  let i = 0;
  return async () => {
    if (i >= chunks.length) {
      return 'EOF';
    }
    return chunks[i++];
  }
};


// This is the actual core library code. Obviously you could make some design
// decisions differently here, but I hope it's suggestive enough.
// Yes, it's a little awkward to write code like this, but hopefully you'd only
// need to when writing a library and trying to squeeze out as much performance
// as possible; the difference in performance between this and the purely async
// version is unlikely to matter much for most people.
let iterable = {
  [Symbol.iterator]() {
    let pending = false;
    let reader = getReader();
    let currentChunk = [];
    return {
      next() {
        if (pending) {
          throw new Error('called again before waiting for previous result');
        }
        if (currentChunk === 'EOF') {
          return { done: true };
        }
        if (currentChunk.length === 0) {
          pending = true;
          let promise = (async () => {
            currentChunk = await reader();
            pending = false;
            if (currentChunk === 'EOF') {
              return { done: true };
            } else {
              return { done: false, value: currentChunk.shift() };
            }
          })();
          return { done: false, value: { async: true, promise } };
        } else {
          return { done: false, value: { async: false, value: currentChunk.shift() } };
        }
      }
    };
  }
};


// Here's how you'd use it. Note that there's no `await` at all in cases where the value is available synchronously.
async function main() {
  for (let item of iterable) {
    if (item.async) {
      item = await item.promise;
    }
    if (item.done) {
      break;
    }
    console.log(item.value);
  }
}
main();

Edit: of course in practice what I'd do is just yield the entire chunk, and expect the consumer to write an inner loop to walk that chunk. But the above works if you for some reason that's not viable.

2 Likes

The iteration protocol inherently involves a bunch of object allocations, so if you really want maximum performance you'll have to forgo it entirely anyway.

If object allocations were terribly costly the iteration protocol wouldn't have been designed around them. Generational garbage collection is designed with the basic assumption that most small objects are short lived. Here's some reading on the topic: Trash talk: the Orinoco garbage collector Β· V8

You can resolve a Promise synchronously. Code which is await ing (or .then ing) that Promise won't run until the next microtask tick, but that's very inexpensive, and engines are getting better at optimizing those out entirely.

Six of one, half dozen of the other. My point was that you are waiting for the next microtask tick. If we are talking about inputs of any significant size all that time really adds up. I've got to get a number on this, but hearsay leads me to believe it might be on the order of 10x speed difference. The engine will not optimize that away, as that is the behavior the spec dictates.

Here, by the way, is how I came to write this up: https://github.com/nodejs/node/issues/31979
The result of that discussion seemed to be that there was no chance the engine would optimize that cost away.

Another thing: the problem with the code you've shared is that sync iteration expects it to be possible to know whether you're done synchronously. Your own code expects that, but it's an unsound expectation. If you remedy that problem, I think you'll discover that you're just left with async iterators.

If object allocations were terribly costly the iteration protocol wouldn't have been designed around them.

They're not terribly costly, they're just not free. If you want every possible bit of performance you do generally have to avoid them. But you probably don't want every possible bit of performance if it makes your code too awkward to write or consume.

The engine will not optimize that away, as that is the behavior the spec dictates.

The spec dictates that code be run in a particular order. It doesn't dictate that it take any particular amount of time to do so; a "microtask turn" is not a unit of time. Engines can make microtask turns have low or zero overhead in many cases.

Another thing: the problem with the code you've shared is that sync iteration expects it to be possible to know whether you're done synchronously. Your own code expects that, but it's an unsound expectation.

My code works fine. You can run it. Yes, it has a stricter contract for consumers than most iterators do - specifically, it requires that consumers await the value in the case that it is not synchronously available before calling .next again - but that's fine; it's permissible for code to have stricter contracts than are enforced by the language.

Apologies, yes, I see now that your code does work. I did not look as closely as I should have because it was clear to me at a glance that that code is not the solution to the problem I have. My goal is to create the most comprehensive open source library for working with iterables and async iterables, so a solution that supports only a subset of the iterable contract is not one that I could endorse. If I was solving a specific problem instead of considering a general class of solutions I would probably, as you say, write an async iterable of sync iterables (in essence a stream).

That said while sync-of-async may be the best current option in its performance class, I don't think it's especially good. It makes central an implementation detail (chunking) that most code would be willing to pay a small perf price not to need to know about. Code written with nested loops has other ergonomic problems too, like the difficulty of breaking out of both loops and the presence of edge cases at the chunk boundaries that may create subtle bugs, especially since the kind of content developers tend to test with (in the file system at least) is not larger than one chunk.

I am more intrigued by your point about microtasks being the subject of further possible optimization. Obviously it would be better if this were an implementation/optimization detail that could be hidden completely, but I am not yet convinced that that is the case. I read through a recent blog post about what optimizations are being done already, but these are almost certain to have been in place already for the code which was seeing a 10x slowdown. The problem is that each for await loop will always create two microtasks per value, and any meaningful work is going to be the result of nesting multiple operations, e.g. perhaps a split followed by a map, a filter, and a join. In this charitably simple setup we have 8 microtasks per character of text input, and it isn't hard to imagine that number being quite a bit higher.

Thus any solution to this problem would need to be aggressive in order to combat such a high amount of overhead. The only optimization that in my mind fits the bill would be not creating microtasks. I would naively assume that it's safe to synchronously continue execution without mucking about in microtasks at all if I was awaiting the value of a settled promise and the microtask queue was empty, but it seems telling to me that a) the spec doesn't say that and b) the engines haven't done it. But of course I could be wrong. I'm dipping my toes in a very deep pool here.

Just found another person with similar experiences: https://medium.com/netscape/async-iterators-these-promises-are-killing-my-performance-4767df03d85b

…and that the call stack is empty or consists of do-nothing frames. The code that queued the initial promise job (microtask) still needs to run to completion. But yes, I'd share your naive assumption that this should be the case most of the time on async iterables that yielded synchronously.

The only problem that cannot be solved by optimisations is running multiple asynchronous (microtask-only) generators at once. (Microtask-only meaning they immediately fulfill all promises, not using promises that get resolved in macro tasks). Something like

const range = Array.from({length: 100}, (_, i) => i);
(async () => { for await (const i of range) console.log('a', i); })();
(async () => { for await (const i of range) console.log('b', i); })();

will have to jump back and forth between the two asynchronous executions instead of running straight through each loop. This is a design decision far older than this thread, reaching back to when the promise job queue was introduced in the spec. Unlike promises/A+, which gives total freedom for execution order, consistency between engines was valued higher than optimisation potential and a deterministic scheduling algorithm was introduced. It features this "fair scheduling" between two microtask-constrained processes, at the cost of performance.

The spec doesn't need to point out performance optimisations, it's the job of implementers to design them. Just consider how different the actual implementations of scopes and prototype property lookups are from what the spec describes. The spec details an "as if" behaviour. It certainly allows skipping microtask ticks if the result is not observable.
Also, even if the spec did this, it's no guarantee that the optimisation is actually implemented, see the mess with tail call elimination.

I think that should also be the approach to solve the general problem. Instead of introducing a third iteration protocol, I'd rather extend the existing ones to allow an asynchronous iterator's next() method return (a promise for) a synchronous iterable, and have this automatically consumed by a for await loop. While only affecting manually implemented iterators at first, later the generator yield* semantics could be changed to use that as well.

I wonder what the best way is to find someone involved in engine development would be who could comment on whether that optimization is one being considered?

I'd rather extend the existing ones to allow an asynchronous iterator's next() method return (a promise for) a synchronous iterable, and have this automatically consumed by a for await loop

Wouldn't how a for await loop treated that situation would be a very breaking change to the language? At very least you'd still have to extend the iterator spec to permit, say, { done, iterableValue } instead of { done, value }, which is still technically not non-breaking as a custom iterator could already have stored something in that field (I admit it's unlikely). Also what happens then when custom code not written using for..of encounters an iterator that uses iterableValue not value? It breaks.

Therefore I still think that the solution I am proposing has attractive properties.

  • It is 100% nonbreaking. The syntax is otherwise an error, and you have to opt into using the new protocol. Nobody should be accidentally opting in because well known symbols should only be added by the spec. Everyone else can use Symbol.for.
  • By not creating step shapes other than { done, value } it protects from deopts in hot pathways that deal with iterators.
  • It isn't subject to the bouncing problem you show with two async generators
  • When implemented correctly there is full interoperability. A consumer that doesn't understand the new protocol can seamlessly fall back to the old one.

I have thousands of hours into developing iterator tooling at this point. I chose my proposal quite carefully :smile:

Just want to ask: what does this solve that https://tc39.es/ecma262/#sec-async-from-sync-iterator-objects doesn't?

Well that is only useful if what you really have is a sync iterator. I am talking in essence about offering a performant and user-friendly abstraction over streams, which are not fundamentally synchronous. Of course it is possible to do blocking IO from a sync iterable, but then you have all the problems associated with blocking IO.

Let me clarify: those objects are created to be immediately used with for await loops. You can still use non-blocking I/O in those loops.

As for streams, I've been working on https://github.com/isiahmeadows/proposal-es-streams and am seeking early feedback before seeking a champion.

Interesting proposal on streams. Your'e using the iterator protocol underneath, so I think your solution would have the same perf problem and benefit from the solution I'm proposing.

I just got a benchmark running on node v15.2.1. It shows a 264x speed up for sync iteration:

βœ“  for await..of values from one chunk x 8.39 ops/sec Β±0.84% (43 runs sampled)
βœ“  for..of values from one chunk x 2,222 ops/sec Β±1.49% (86 runs sampled)

Result: for..of values from one chunk is 26397.86% faster than for await..of values from one chunk.

You can see the full code I ran in this gist.

As you can see the operation you note, syncFromAsyncIterator is happening. I checked to see if that might be what was slow, and it was not.

I hope this makes it clear why I think it's worth effort to optimize this. The cost of of converting a sync step to an async one appears to be trivial, but the cost of the awaits that are spurious on average 99.998% of the time ((1 - (1/65535)) / 100) is not.

It seems the argument against this proposal is here. I located that interesting bit of reading in the discussion on this idea.

1 Like

I'm thinking right now that that kind of madness does not apply here. Obviously I can construct a lot of bad behaviors by mixing callbacks with async/await. But that isn't unique to for loops. And iter-tools itself proves that there's a plethora of legitimate uses for the pattern.

Can you say why not? The code that I gave works with any chunked async iterator. The only difference from your proposal is that it requires consumers to put two lines (if (item.async) item = await item; if (item.done) break;) at the beginning of their for of loops, rather than writing for await?, and requires a little more work on the part of the author of the iterable.

Code written with nested loops has other ergonomic problems too, like the difficulty of breaking out of both loops and the presence of edge cases at the chunk boundaries that may create subtle bugs

Mm.. I'm not convinced that having to label your outer loop is a substantial burden, and it's not clear to me why for (let chunk of chunks) for (let item of chunk) doSomething(item) would have any chunk-boundary-related issues that for (let item of stream) doSomething(item) would not.

I agree nested loops are less nice; it's just that the difficulties they entail don't seem large enough to me to warrant new syntax.

I just got a benchmark running on node v15.2.1. It shows a 264x speed up for sync iteration:

Engines have soured on this sort of microbenchmark. For it to be worth optimizing this pattern, or the language adding syntax to help allow better optimizations, you'd need to demonstrate that this is a significant source of slowdown in a reasonably large class of real applications. Here you've shown a 264x overhead vs the sync loop, true, but you've also shown that the unnecessary await is adding no more than one five-millionth of a second of overhead to each iteration of the loop. It's hard for me to imagine that being a large part of the cost in an actual application.

Just to illustrate that, if I use range(65536) directly instead of expanding it into an array first, the difference between the two loops is more like 8x, rather than 264x.

Can you say why not?

Certainly. I intend (in the full passage of time) to displace lodash as the top package on npm by providing a set of utilities for doing work with iterators and iterables. Such tools will be sought out as Map and Set gain broader adoption. I intend to support everything the language supports, but nobody is going to want to use my package if I have set up a competing standard to TC39's as to how iterators work.

Just to illustrate that, if I use range(65536) directly instead of expanding it into an array first, the difference between the two loops is more like 8x, rather than 264x.

I'm going out of order here, because you were absolutely right. 8x is the number worth considering, not 264x. I understand the limitations of microbenchmarks (and you've just demonstrated another), but in this case I wouldn't dismiss them so quickly. One of the greatest benefits of iterators is that they allow operations to be chained without intermediate (large) allocations. Yes there are smaller allocations, but those have advantages, e.g. that they need never trigger a major GC. As such the expectation is that operations will be smaller and designed to work together in chains. A great example is flat and map. Together, these are flat(map(...)). This is a relatively common operation so arrays offer it too, but there it has to be one operation (flatMap), because otherwise you'd allocate a bunch of intermediate arrays that are totally unnecessary. My point is that the 8x slowdown isn't going to just fade into the background, as other minor costs that can look major in microbechmarks do. Instead it will tend to be incurred repeatedly and continuously as the complexity of the required work grows. The individual writing the excel parser offered you a real world example. Their real world use case slowed down by 40% when they attempted to use async iterators. Now of course 40% isn't 800%, but it was enough for them to reject the syntax and file a bug on node.

I agree nested loops are less nice; it's just that the difficulties they entail don't seem large enough to me to warrant new syntax.

The new syntax is already here. It's just slower than it needs to be. The evolution of programming will always consist of humans making more work for computers so that there is less that they have to think about. Most likely even the unnecessary work done by async iterators would eventually be deemed an acceptable price to pay, but I don't think that's a good way to think about things generally, because as I think this proposal demonstrates there's just no need to pay that price. By offering an optimization we can speed the adoption of the new syntax (minimizing pain), and speed the code (minimizing cost in planet-warming gases).