for await? .. of

Trying one more time because this is pretty important and I have a lot more practical experience with this idea and so more evidence than I have had previously.

The proposal I wish to champion (or find a champion for) is to create a for of loop over iterator steps, but which only awaits iterator steps and only when it is necessary because the value of step is a promise.

for await?(let item of iterator);

The existence of this iteration syntax directly implies the existence of a new core iteration protocol, though protocols are essentially duck-typed in JS. The new protocol would be denoted Symbol.streamIterator. To implement the protocol the iterator’s next() method would return either a { done, value } object or a Promise which would then be expected to resolve to { done, value }.

This solves several seemingly-unrelated problems at the same time:

  • It provides a primitive suitable for concurrent processing as it is a complete embedding space for data – neutral towards all possible values passed as data. Async iterators as designed are netural to almost all data values: except promises.
  • It obviates the needs for WebStreams by providing a primitive fast enough to abstract over a single data source in which the individual items become available in chunks.
  • It permits stream transformers to be defined once while being usable for both sync and async data. You might define a transformer like toUpperCase and now you could do both toUpperCase("str") and toUpperCase(readFile('./file.txt')). Things that are sync stay sync (keeping fast things fast), and awaits happen when they are needed.

It's not terribly hard to polyfill this behavior in current-day JS. In a what-color-is-your-function sense you might call a generator function of this kind a "purple generator" as it can (potentially) be used natively in both red and blue calling functions.

You can polyfill purple generators into Javascript fairly easily. Most of the difficulty is in doing iteration without the help of a for loop.

let toUpperCase = (source) => new Wrapper(toUpperCase__(source));

function *toUpperCase__(source) {
  let iter = source[Symbol.streamIterator]();
  let step;
  try {
    while(true) {
      step = iter.next();
      if (step instanceOf Promise) {
        // the purple magic happens here!
        // wrapper ensures the yielded promise doesn't end up in the data stream
        step = yield step;
      }
    }
    if (step.done) break;
    let chr = step.value;

    yield chr.toUpperCase();
  } finally {
    iter?.return();
  }
}



class Wrapper() {
  constructor(generator) {
    this.generator = generator;
  }

  next(value) {
    let step = this.generator.next(value);

    if (step.done) {
      return { value: undefined, done: true };
    } else if (step.value instanceof Promise) {
      // resolve the promise for the wrapped generator
      // rolls the promise into `step` for the consumer
      return step.value.then((value) => {
        return this.next(value);
      });
    } else {
      let { value } = step;
      return { value, done: false };
    }
  }
}

This proposal is to shrink all that boilerplate down to this:

async? function *toUpperCase(source) {
  for await?(let chr of source) {
    yield chr.toUpperCase();
  }
}

If you were to try to visualize how data is moving around it would be useful to think of a purple generator function a bit like a bucket brigate moving water. As long as there's buckets ready to go from the water source regularly, each worker in the line will be in a steady rhythm: take from the left, pass to the right, take from the left, pass to the right. Both sync-ness and async-ness propagate through this system. When buckets are ready to go, the system is in a synchronous state with high throughput. When buckets are no longer ready though, perhaps because the tub of water at the source of the chain has run dry, then each worker will look to their left and see a "promise of future bucket" and will understand that they should wait until synchronous operation may resume. If buckets are only available very slowly, each worker may return to the idling state after every bucket, mimicking how async iteration works currently, yet there is no reason to force each worker into an idling state as a defensive measure, as async iteration does.

I'm just putting together the final details to be able to release a fully-featured stream parser framework implemented as purple generator functions, so there should soon be a lively and rapidly-growing ecosystem of people relying on this age-old technique for moving large numbers of small items quickly

An API should either never, or always, vend a Promise, lest it release z̲̗̼͙̥͚͛͑̏a̦̟̳͋̄̅ͬ̌͒͟ļ̟̉͌ͪ͌̃̚g͔͇̯̜ͬ̒́o̢̹ͧͥͪͬ - this seems like it would encourage a horrifically bad practice.

1 Like

I'm going to make an attempt to invite Isaac Schlueter to this conversation since you and me have been back and forth over this ground many times.

I feel like I'm performing my argument at this point but it comes in several parts:

  1. That's not what Isaac said. If you read carefully what Isaac said was not "don't ever do this" but "this is exactly what you should do."
  2. Over the past 3 years everything I've built has been in or on top of purple generator functions. I have now a large and ever-growing amount of practical experience with them, so I can say with a high degree of confidence that Zalgo has not been released in practice.
  3. Zalgo is released when it becomes literally impossible to determine which order callback functions will be called in. This happens when you do the thing Isaacs is telling you to never ever do: write a promise which sometimes calls its callback synchronously and sometimes asynchronously. I don't do this. I have never done this and will never do it. You have never even suggested that I have done this, other than by saying repeatedly (and without evidence) that I am releasing Zalgo.

Here is the relevant text of the Desigining APIs for Asynchrony post where he tells you to do this (parenthetical emaphsis mine):

I know what you’re thinking: “But you just told me to use nextTick!”

And yes, it’s true, you should use synthetic deferrals when the only alternative is releasing Zalgo. However, these synthetic deferrals should be treated as a code smell. They are a sign that your API might not be optimally designed.

Ideally, you should know whether something is going to be immediately available, or not. Realistically, you should be able to take a pretty decent guess about whether the result is going to be immediately available most of the time, or not, and then follow this handy guide:

  • If the result is usually available right now, and performance matters a lot:
    1. Check if the result is available.
    2. If it is, return it. (yep, the proposal is to do this)
    3. If it is not, return an error code, set a flag, etc. (returning a promise is like setting a flag)
    4. Provide some Zalgo-safe mechanism for handling the error case and awaiting future availability. (awaiting future availability is what promises do)
1 Like

Just to add a little more color, here's an example of how I can use this API to radically simplify fast filesystem access in node:

import { readFile, readDir, decodeUTF8 } from '@bablr/fs';

let bytes = readFile(import.meta.url)
let chars = decodeUTF8(bytes);

Some good things about this:

  • readFile no longer needs to know about character encoding since it isn't the thing allocating the storage for the string anymore!
  • decodeUTF8 is written once thanks to the unified iteration protoocol. Otherwise you'd need decodeUTF8Sync and decodeUTF8Async. Because stream iterators propagate sync-ness, you can use the streaming version of decodeUTF8 to synchronously decode from a buffer into a string if you need to.
  • We use web streams internally to get the fastest possible "Bring Your Own Buffer" access to the data, but we don't need to expose complex concepts like BYOB and Controllers into our public API because iterators are "BYOB" by nature. In fact I'd go so far as to say that the point of iteration as an abstraction is to permit data to be defined without saying how it was or will be stored.

is summoned

The Zalgo post does need to be updated, it was obviously written back when Promises were not yet a well defined language feature.

I haven't read the proposal here in detail. Please excuse anything that might have already been covered, as I don't have an informed opinion about the language feature being discussed, but I would like to clarify what is being attributed to me and that post.

The fundamental issue that "Zalgo" refers to (and that Havoc's blog post on callbacks discussed in less silly terms) is this:

let called = false
const cb = () => { called = true }
const doSomething = (cb) => {
  // does some things, eventually calls cb()
}
doSomething(cb)
// EXACTLY ONE of these must be true, always:
// assert(called)
// assert(!called)

Zalgo is so named because of the maddening experience of debugging timing issues that occur when it is impossible to know the order of operations from reading code or inspecting return values.

This has been extended (I'd even say, stretched) to imply that I'm saying that all calls to all functions must deterministically and consistently return either a promise or a value and not both. And, certainly, there are problems that can result from such ambiguity if it is not managed. However, this is not Zalgo-releasing, because it is clear what order the methods will have been called in:

let called = false
const cb = () => { called = true }
const doSomething = (cb) => {
  if (Math.random() < 0.5) {
    // gotta defer for some reason
    return Promise.resolve().then(() => cb())
  } else {
    cb()
  }
}
await doSomething(cb)
// this assertion always passes, no maddening timing challenges, no Zalgo
assert(called)

That is, the language now contains a straightforward and idiomatic way to consistently and deterministically indicate that an action could not be performed synchronously and had to be deferred, and to easily wait for that action to be completed before moving on in that case.

As mentioned in the post, this sort of "return an indication that it could not be performed synchronously, and a means to know when to continue" is a more advanced/complicated API for the user, with more checks and things to get wrong, and so should not be used unless (a) asynchrony is required less often than not, and (b) the performance benefits of avoiding asynchrony is great enough to justify the complication.

The weighing of these trade-offs is as much art as science, of course, and I'd argue that the existence of async/await makes the cost much lower.

Just in briefly scanning the discussion here, I don't see any mention of this, but for completeness's sake, in the Zalgo post also I failed to mention the concept of a "Zalgo-preserving API", which has also become a term of art. This is another case where it's perfectly acceptable for a utility or iterator function to be maybe-sync, based on the function passed in. So, despite being technically nondeterministic and inconsistent, it's consistent to what the user passes in.

const preserveZalgoForEach = (list, cb, i = 0) => {
  for (; i < list.length; i++) {
    if (!(i in list)) continue // skip holes
    const res = cb(list[i], i)
    if (res instanceof Promise) return res.then(() => {
      preserveZalgoForEach(list, cb, i + 1)
    })
  }
}

(While there are of course complicated ways to accomplish the same thing only using functions instead of Promises, the existence of a first-class language feature to indicate "thing will be done later" makes it much more straightforward.)

I bring this up only because there have been cases (for example, throughout the Web Streams API) where I worry that an overly cautious approach to Zalgo issues has resulted in interfaces that are unnecessarily slowed down due to contracts enforcing Promise deferral. Rather than reducing timing complications, these can even sometimes work to make those issues more challenging, as the Node team has been finding for years, because synchrony is so fundamental and so easily exposed in the API surface.

So if the suggestion here is "Make a thing that can usually be synchronous, instead be always asynchronous", then that might be justifiable based on the specific trade-offs, but please at least don't claim that I told you that you need do that. There are very often ways to achieve a better result, and it's worth making the case on the merits of each option available for this specific situation.

3 Likes

To fill you in on a little bit of the context here, for await(value of iterator); as it exists in JS now does defer every step of iterator defensively, which is to say that step is wrapped in a promise even if iterator does not return a promise-valued step.

Because of this defensive awaiting, it then becomes infeasible for perf reasons for character streams to be encapsulated as iterators of characters. A lot of the time you'd have 65535 characters (a 64k chunk of ASCII) synchronously ready, and you'd be doing 65534 defensive deferrals -- one for each character. It's actually worse than that though, because if you're composing iterators (and thus for loops) to build up a transformed stream, then each layer of iterators adds another layer of defensive deferrals. Adding a utf8 decoding transformer to the raw byte stream might already double that number to 131068 defensive deferrals using traditional for await loops, and that's before we do anything with the text! A program that processes text streams using traditional for loops rapidly becomes mostly overhead. The engine will soon be spending most of its time tearing down stack traces so that can stitch them back together as async/await requires it to.

That is my understanding of why we have the current status quo in API design: to represent a stream as an async iterator which yields buffers or strings or uint8arrays or (something). I'm not a fan of this because it pierces through the abstraction over storage and forces you to tackle complicated concerns like "does the producer or consumer own those buffers, and so do we need to copy the buffers defensively so that the consumer doesn't accidentally modify the producer's data?" The complexity of handling the chunks works its way into everything you try to do, making it concrete where it should be abstract, and all because the language forces defensive awaits on us which make it too slow to abstract a conceptual single data stream as a single (efficient) iterator.

My proposal is to introduce new primitives into the language like for await?(chr of streamIter) that would not await defensively, and thus would in effect ratify the idea of character streams.

If I'm reading the code right, that preserveZalgoForEach really looks a lot like an alternate factoring of the iteration pattern I want to make canon. It's preserves sync-ness and propagates async-ness. Everything still has well-defined order.

I think what I'm talking about is more or less just taking those same qualities but expressing them in a more functional iterator-based API rather than a callback-based API. You have preserveZalgoForEach return a promise to its caller to handle, but an iterator won't be able to use that trick so instead I just weave the extra promise into the iteration protocol with this magic helper.

class StreamWrapper() {
  constructor(generator) {
    this.generator = generator;
  }

  next(value) {
    let step = this.generator.next(value);

    return step.value instanceof Promise
      ? step.value.then((value) => this.next(value));
      : step;
  }

  [Symbol.streamIterator]() { return this }
}

Yeah, that would make it a lot easier to build high performance streams, for sure. I've had to implement similar synchrony-preserving patterns for node-tar and the minipass family of libraries in order to get adequate performance. Those defensive deferrals impose a huge amount of extra overhead.