syncAndAsyncIterator

Here, by the way, is how I came to write this up: https://github.com/nodejs/node/issues/31979
The result of that discussion seemed to be that there was no chance the engine would optimize that cost away.

Another thing: the problem with the code you've shared is that sync iteration expects it to be possible to know whether you're done synchronously. Your own code expects that, but it's an unsound expectation. If you remedy that problem, I think you'll discover that you're just left with async iterators.

If object allocations were terribly costly the iteration protocol wouldn't have been designed around them.

They're not terribly costly, they're just not free. If you want every possible bit of performance you do generally have to avoid them. But you probably don't want every possible bit of performance if it makes your code too awkward to write or consume.

The engine will not optimize that away, as that is the behavior the spec dictates.

The spec dictates that code be run in a particular order. It doesn't dictate that it take any particular amount of time to do so; a "microtask turn" is not a unit of time. Engines can make microtask turns have low or zero overhead in many cases.

Another thing: the problem with the code you've shared is that sync iteration expects it to be possible to know whether you're done synchronously. Your own code expects that, but it's an unsound expectation.

My code works fine. You can run it. Yes, it has a stricter contract for consumers than most iterators do - specifically, it requires that consumers await the value in the case that it is not synchronously available before calling .next again - but that's fine; it's permissible for code to have stricter contracts than are enforced by the language.

Apologies, yes, I see now that your code does work. I did not look as closely as I should have because it was clear to me at a glance that that code is not the solution to the problem I have. My goal is to create the most comprehensive open source library for working with iterables and async iterables, so a solution that supports only a subset of the iterable contract is not one that I could endorse. If I was solving a specific problem instead of considering a general class of solutions I would probably, as you say, write an async iterable of sync iterables (in essence a stream).

That said while sync-of-async may be the best current option in its performance class, I don't think it's especially good. It makes central an implementation detail (chunking) that most code would be willing to pay a small perf price not to need to know about. Code written with nested loops has other ergonomic problems too, like the difficulty of breaking out of both loops and the presence of edge cases at the chunk boundaries that may create subtle bugs, especially since the kind of content developers tend to test with (in the file system at least) is not larger than one chunk.

I am more intrigued by your point about microtasks being the subject of further possible optimization. Obviously it would be better if this were an implementation/optimization detail that could be hidden completely, but I am not yet convinced that that is the case. I read through a recent blog post about what optimizations are being done already, but these are almost certain to have been in place already for the code which was seeing a 10x slowdown. The problem is that each for await loop will always create two microtasks per value, and any meaningful work is going to be the result of nesting multiple operations, e.g. perhaps a split followed by a map, a filter, and a join. In this charitably simple setup we have 8 microtasks per character of text input, and it isn't hard to imagine that number being quite a bit higher.

Thus any solution to this problem would need to be aggressive in order to combat such a high amount of overhead. The only optimization that in my mind fits the bill would be not creating microtasks. I would naively assume that it's safe to synchronously continue execution without mucking about in microtasks at all if I was awaiting the value of a settled promise and the microtask queue was empty, but it seems telling to me that a) the spec doesn't say that and b) the engines haven't done it. But of course I could be wrong. I'm dipping my toes in a very deep pool here.

Just found another person with similar experiences: https://medium.com/netscape/async-iterators-these-promises-are-killing-my-performance-4767df03d85b

…and that the call stack is empty or consists of do-nothing frames. The code that queued the initial promise job (microtask) still needs to run to completion. But yes, I'd share your naive assumption that this should be the case most of the time on async iterables that yielded synchronously.

The only problem that cannot be solved by optimisations is running multiple asynchronous (microtask-only) generators at once. (Microtask-only meaning they immediately fulfill all promises, not using promises that get resolved in macro tasks). Something like

const range = Array.from({length: 100}, (_, i) => i);
(async () => { for await (const i of range) console.log('a', i); })();
(async () => { for await (const i of range) console.log('b', i); })();

will have to jump back and forth between the two asynchronous executions instead of running straight through each loop. This is a design decision far older than this thread, reaching back to when the promise job queue was introduced in the spec. Unlike promises/A+, which gives total freedom for execution order, consistency between engines was valued higher than optimisation potential and a deterministic scheduling algorithm was introduced. It features this "fair scheduling" between two microtask-constrained processes, at the cost of performance.

The spec doesn't need to point out performance optimisations, it's the job of implementers to design them. Just consider how different the actual implementations of scopes and prototype property lookups are from what the spec describes. The spec details an "as if" behaviour. It certainly allows skipping microtask ticks if the result is not observable.
Also, even if the spec did this, it's no guarantee that the optimisation is actually implemented, see the mess with tail call elimination.

I think that should also be the approach to solve the general problem. Instead of introducing a third iteration protocol, I'd rather extend the existing ones to allow an asynchronous iterator's next() method return (a promise for) a synchronous iterable, and have this automatically consumed by a for await loop. While only affecting manually implemented iterators at first, later the generator yield* semantics could be changed to use that as well.

1 Like

I wonder what the best way is to find someone involved in engine development would be who could comment on whether that optimization is one being considered?

I'd rather extend the existing ones to allow an asynchronous iterator's next() method return (a promise for) a synchronous iterable, and have this automatically consumed by a for await loop

Wouldn't how a for await loop treated that situation would be a very breaking change to the language? At very least you'd still have to extend the iterator spec to permit, say, { done, iterableValue } instead of { done, value }, which is still technically not non-breaking as a custom iterator could already have stored something in that field (I admit it's unlikely). Also what happens then when custom code not written using for..of encounters an iterator that uses iterableValue not value? It breaks.

Therefore I still think that the solution I am proposing has attractive properties.

  • It is 100% nonbreaking. The syntax is otherwise an error, and you have to opt into using the new protocol. Nobody should be accidentally opting in because well known symbols should only be added by the spec. Everyone else can use Symbol.for.
  • By not creating step shapes other than { done, value } it protects from deopts in hot pathways that deal with iterators.
  • It isn't subject to the bouncing problem you show with two async generators
  • When implemented correctly there is full interoperability. A consumer that doesn't understand the new protocol can seamlessly fall back to the old one.

I have thousands of hours into developing iterator tooling at this point. I chose my proposal quite carefully :smile:

Just want to ask: what does this solve that https://tc39.es/ecma262/#sec-async-from-sync-iterator-objects doesn't?

Well that is only useful if what you really have is a sync iterator. I am talking in essence about offering a performant and user-friendly abstraction over streams, which are not fundamentally synchronous. Of course it is possible to do blocking IO from a sync iterable, but then you have all the problems associated with blocking IO.

Let me clarify: those objects are created to be immediately used with for await loops. You can still use non-blocking I/O in those loops.

As for streams, I've been working on https://github.com/isiahmeadows/proposal-es-streams and am seeking early feedback before seeking a champion.

Interesting proposal on streams. Your'e using the iterator protocol underneath, so I think your solution would have the same perf problem and benefit from the solution I'm proposing.

I just got a benchmark running on node v15.2.1. It shows a 264x speed up for sync iteration:

✓  for await..of values from one chunk x 8.39 ops/sec ±0.84% (43 runs sampled)
✓  for..of values from one chunk x 2,222 ops/sec ±1.49% (86 runs sampled)

Result: for..of values from one chunk is 26397.86% faster than for await..of values from one chunk.

You can see the full code I ran in this gist.

As you can see the operation you note, syncFromAsyncIterator is happening. I checked to see if that might be what was slow, and it was not.

I hope this makes it clear why I think it's worth effort to optimize this. The cost of of converting a sync step to an async one appears to be trivial, but the cost of the awaits that are spurious on average 99.998% of the time ((1 - (1/65535)) / 100) is not.

It seems the argument against this proposal is here. I located that interesting bit of reading in the discussion on this idea.

1 Like

I'm thinking right now that that kind of madness does not apply here. Obviously I can construct a lot of bad behaviors by mixing callbacks with async/await. But that isn't unique to for loops. And iter-tools itself proves that there's a plethora of legitimate uses for the pattern.

Can you say why not? The code that I gave works with any chunked async iterator. The only difference from your proposal is that it requires consumers to put two lines (if (item.async) item = await item; if (item.done) break;) at the beginning of their for of loops, rather than writing for await?, and requires a little more work on the part of the author of the iterable.

Code written with nested loops has other ergonomic problems too, like the difficulty of breaking out of both loops and the presence of edge cases at the chunk boundaries that may create subtle bugs

Mm.. I'm not convinced that having to label your outer loop is a substantial burden, and it's not clear to me why for (let chunk of chunks) for (let item of chunk) doSomething(item) would have any chunk-boundary-related issues that for (let item of stream) doSomething(item) would not.

I agree nested loops are less nice; it's just that the difficulties they entail don't seem large enough to me to warrant new syntax.

I just got a benchmark running on node v15.2.1. It shows a 264x speed up for sync iteration:

Engines have soured on this sort of microbenchmark. For it to be worth optimizing this pattern, or the language adding syntax to help allow better optimizations, you'd need to demonstrate that this is a significant source of slowdown in a reasonably large class of real applications. Here you've shown a 264x overhead vs the sync loop, true, but you've also shown that the unnecessary await is adding no more than one five-millionth of a second of overhead to each iteration of the loop. It's hard for me to imagine that being a large part of the cost in an actual application.

Just to illustrate that, if I use range(65536) directly instead of expanding it into an array first, the difference between the two loops is more like 8x, rather than 264x.

Can you say why not?

Certainly. I intend (in the full passage of time) to displace lodash as the top package on npm by providing a set of utilities for doing work with iterators and iterables. Such tools will be sought out as Map and Set gain broader adoption. I intend to support everything the language supports, but nobody is going to want to use my package if I have set up a competing standard to TC39's as to how iterators work.

Just to illustrate that, if I use range(65536) directly instead of expanding it into an array first, the difference between the two loops is more like 8x, rather than 264x.

I'm going out of order here, because you were absolutely right. 8x is the number worth considering, not 264x. I understand the limitations of microbenchmarks (and you've just demonstrated another), but in this case I wouldn't dismiss them so quickly. One of the greatest benefits of iterators is that they allow operations to be chained without intermediate (large) allocations. Yes there are smaller allocations, but those have advantages, e.g. that they need never trigger a major GC. As such the expectation is that operations will be smaller and designed to work together in chains. A great example is flat and map. Together, these are flat(map(...)). This is a relatively common operation so arrays offer it too, but there it has to be one operation (flatMap), because otherwise you'd allocate a bunch of intermediate arrays that are totally unnecessary. My point is that the 8x slowdown isn't going to just fade into the background, as other minor costs that can look major in microbechmarks do. Instead it will tend to be incurred repeatedly and continuously as the complexity of the required work grows. The individual writing the excel parser offered you a real world example. Their real world use case slowed down by 40% when they attempted to use async iterators. Now of course 40% isn't 800%, but it was enough for them to reject the syntax and file a bug on node.

I agree nested loops are less nice; it's just that the difficulties they entail don't seem large enough to me to warrant new syntax.

The new syntax is already here. It's just slower than it needs to be. The evolution of programming will always consist of humans making more work for computers so that there is less that they have to think about. Most likely even the unnecessary work done by async iterators would eventually be deemed an acceptable price to pay, but I don't think that's a good way to think about things generally, because as I think this proposal demonstrates there's just no need to pay that price. By offering an optimization we can speed the adoption of the new syntax (minimizing pain), and speed the code (minimizing cost in planet-warming gases).

That's not actually why flatMap exists; otherwise there would be filterMap and so on as well. The reason flatMap exists is that it is a single primitive operation for people coming from a functional programming background. It's nothing to do with performance.

My point is that the 8x slowdown isn't going to just fade into the background, as other minor costs that can look major in microbechmarks do. Instead it will tend to be incurred repeatedly and continuously as the complexity of the required work grows.

It might, but performance is tricky; it is hard to trust this assertion without having concrete, real-world examples where that thing happens. I suspect it is false: your proposal is for a new kind of loop, and in my experience, people using loops will tend to put all of their logic into the body of the loop, rather than walking over their data repeatedly with multiple loops. That is, they have a single loop as in

let output = [];
for await (let item of data) {
  let x = transform(item);
  if (test(x)) {
    output.push(x);
  }
}

rather than

let transformed = [];
for await (let item of data) {
  transformed.push(transform(item));
}
let output = [];
for (let x of transformed) {
  if (test(x)) {
    output.push(x);
  }
}

And if they only have a single loop, they only pay the cost once.

The individual writing the excel parser offered you a real world example. Their real world use case slowed down by 40% when they attempted to use async iterators.

Their "real world use case" is incrementing a counter in the hot loop. That's still pretty close to a microbenchmark, I think. And that's compared to using callbacks directly, not a comparison to sync iterators.

The new syntax is already here.

for await? is not currently in the language, no. That's the new syntax I'm referring to.

Yes, but the async iterable syntax is here and it can be used to create and consume streams of characters. I think it's silly to suppose that a lot of people are going to be rolling hugely complex combinations of operations into a single for loop. Why would you want to encourage a programming style where for loops are a costly resource to be used only sparingly?

But anyway, here's the thing. Right now you're not wrong. Current usage probably does not warrant this optimization.

That said, lodash is the most popular package on npm. The vast majority of professional javascript programmers use functional methods to transform data, whether that be Array.prototype methods, lodash (still the most popular npm package), or something else. When I name these popular and highly used utilities you'll not that there is not any particularly popular one that offers a high level API over async iterables. I've spent portions of the last two years building that library. I've kept it relatively quiet so far because I didn't start out as an expert in this space. I've had to iterate and learn as I go, and I didn't want to expose people to my mistakes and lots of breaking changes. I'm about to release my work though and start encouraging people to use it. Right now I have quit my job and building these tools is the way I am choosing to spend my time. I aim to displace lodash as the top package (eventually), and I think what I've built will do it. iter-tools blows existing libraries out of the water. All current offerings are missing something major, be it a comprehensive library of methods, API documentation, type definitions, test coverage, parity between sync and async operations, compatibility with tree shaking, a well thought out transpilation strategy or some combination of all of the above.

You are free to wait and see if this comes to pass, but I think it will be an easy sell. I will be making it as easy as it aught to be to use async iterators to work with stream type data, and if the main complaint new adopters have is that it's slower than it aught to be, I'm going to be sending them here. My hope is that a significant number of the people I convince will work for companies which have TC39 delegates. At that point we'll end up right back here. But if we could agree on where this is going maybe we could get a head start?

Why would you want to encourage a programming style where for loops are a costly resource to be used only sparingly?

I don't think they're a costly resource. I think they're astonishingly cheap. They're not literally free, but for the vast majority of programmers, in real code rather than in microbenchmarks, their cost is not going to matter. I'm just observing that, empirically, when using loops people tend to have multiple operations within a single loop in preference to having a ton of different loops, and hence whatever minor costs those loops do entail is not likely to compound.

At that point we'll end up right back here.

Like I said, I think if you demonstrate that this is a significant and hard-to-avoid source of slowdown in a reasonably large class of real application, and it proves to be the case that engines can't optimize these patterns well enough without new syntax in the language, then that will be the appropriate time to try to push this proposal. Given how much engines have already been able to optimize async iterators, I don't think it is currently clear that either of those will prove to be the case.

I'll have to think some more about how to demonstrate the slowdown in a realistic sort of program.

In the mean time, there was some more discussion on the difficulty of optimizing this. What do you think about the bouncing pattern when two loops are executing concurrently? It would seem to break the conditions necessary for an engine-level sync-to-async optimization, and I suspect that it would not be an uncommon circumstance. I guess I should go ticket such an optimization with the v8 folks and see what they say.

Sorry for bumping this and then deleting my post. I guess it stays bumped. Weird. I decided to make a new thread in an attempt to refocus the discussion.