Take 2: generator.prototype[Symbol.mixedIterator]

I'm back around to working on character stream processing and it's still slow. I can now say exactly how much slower it is than it should be, because this time I came armed with perf data and a working demonstration of my optimization which shows a 4x speedup (overall!) in a real parser. Take a look at it all in this repo: GitHub - conartist6/async-perf: Investigations of the speed of JS character stream processing.

Here's the code I wish to have language support for:

async function* readFile(path) {
  for await (const chunk of fs.openReadStream(path, 'utf8')) {
    yield* chunk;
  }
}

async function* codePoints(source) {
  for await? (const chr of source) {
    yield chr.codePointAt(0);
  }
}

// a mixed iterator can yield sync or async results
const iter = codePoints(readFile('./test.csv'))[Symbol.mixedIterator]();
iter.next(); // Promise<{ value: 103, done: false }>
iter.next(); // { value: 117, done: false }

// and for backwards compatibility:
const iter = codePoints(readFile('./test.csv'))[Symbol.asyncIterator]();
iter.next(); // Promise<{ value: 103, done: false }>
iter.next(); // Promise<{ value: 117, done: false }>

I'm still looking for a champion!

Note: the old thread was syncAndAsyncIterator but it had become quite loaded with speculation. I made a new thread in hopes of having a conversation around the more concrete evidence, and because I don't think I did a very good job of explaining what I was talking about there.

Mixing sync and async results is widely seen as a bad thing:

and Promise, and async functions/generators, were very intentionally and thoroughly designed to ensure the twain never meet.

If I’m understanding what you want correctly, you want to sacrifice determinism and the ability to reason about async code, in return for increased performance.

Maybe it is bad, but I don't believe this makes it any worse. In fact the now-famous zalgo post basically says that you should do exactly this:

Ideally, you should know whether something is going to be immediately available, or not. Realistically, you should be able to take a pretty decent guess about whether the result is going to be immediately available most of the time, or not, and then follow this handy guide:

  • If the result is usually available right now, and performance matters a lot:
  1. Check if the result is available.
  2. If it is, return it.
  3. If it is not, return an error code, set a flag, etc.
  4. Provide some Zalgo-safe mechanism for handling the error case and awaiting future availability.

What I interpret the article to say is just that you shouldn't mix effectful callbacks into generator pipelines, e.g. you shouldn't write asyncMap(source, value => sideEffect(value)). Most programmers already know not to do that.

Also the ability to create a hazard like that is nothing novel. As the article points out, all you need is promises and callbacks.

One option for limiting the potential for ambiguous timing might be to limit the use of the loop to the inside of async generator functions. That is its main purpose, and inside a generator it should be pretty clear as to what is and isn't a side-effect. In normal imperative code a for loop is only used for its side effects.

I think it's worth it even if it has to be a bit of a compromise like that. It's effectively the difference between character streaming being a usable strategy in the language and not, and I think that's a pretty powerful ability for javascript to have.

I'm already doing a lot of work on this, and I intend to do a lot more. I've built a regex engine that ingests streaming text, I've built common string manipulation patterns into iter-tools, I'm working on a build system which uses file streaming to allow it to use headers in script files as the source of truth for build data, potentially allowing any build to be fully static, incremental, and verifiable across systems. The potential goes on and on -- better code searching, streaming module dependency analysis, streaming JSON and a whole family of possible parsers.

Here's a streaming parser toolkit which I'm currently laying the foundations for: A human-friendly json parser with parserate Β· GitHub

It allows you to write super simple, readable parsing code. It gives you the full power of regex, allows arbitrary forward lookahead, and it allows you to define syntax without worry about red and blue (sync and async).

I'm not sure I understand. If you are concerned about the performance of async for every step, can't you use a sync iterator which sometimes yield a value and sometimes yield a promise for the value, then recognize whether you have a promise or not on the receiving side and conditionally await? The main issue is for termination of the loop, you may need to use a sentinel in a promise if you can't synchronously determine the end.

Anyway, it sure isn't pretty, but perf sensitive code paths rarely are.

1 Like

@mhofman I see your point, but the practical problems with that approach rule it out. In particular it cannot be composed.

async function* fastMap(chars, fn) {
  for (let char of chars) {
    if (char instanceof Promise) char = await char;

    yield fn(chr);
  }
}

Notice the problem: because we had to use an async generator in order to do any awaiting, every step this generator produces is asynchronous. If we write fastFilter(fastMap(...)) then fastFilter will never hit its optimized case. This is precisely the problem that this proposal aims to solve: you need the async generator to be able to yield synchronously or nothing else you do will matter downstream.

Now, you raise a perfectly valid point though! There's actually no concrete need for the for await? syntax (though it helps immensely). It's only the mixedIterator protocol and its implementation for generators that is absolutely essential to enable performant processing.

Sure it can, at least for fastMap:

Edit: caveat: it assumes the final consumer awaits promises, and that no transform will need to skip a value or generate more than 1 value based on one input. I'm not sure there is an easy way to do that. I agree it's not great, but again, this is a very narrow use case. Most code doesn't need to bother with such optimizations.

@mhofman I'm sorry for being cheeky with you, but I came to the standards body not stackoverflow. Your proposed code leaks errors in such a way as to cause program termination, and causes the sync/async concern to proliferate throughout the logic.

Again the real point is not a "very narrow use case" of 1:1 map -- it's stream comparison and splitting and joining and streaming regex and iterators for graphemes and grapheme clusters. I'm here looking for someone who understands that. I'm also here with an implementation of my proposed solution, so I don't really need workarounds.

Sorry, I haven't had time to look into your implemented approach.

I think my point and the opinion shared multiple delegates is that most code should just not care and use async/await. Yes asynchrony is viral, but it helps the author reason about when there may be interleaving points. In our system, we actually avoid wherever possible any conditional awaiting for that reason, and leverage promise chaining or sub-functions to move the await points to the top scope of functions.

I don't have the links right now, but I remember sync/async helpers being discussed in another thread. I believe Babel internally has some logic like that.

IMO, the use cases are usually so specific that it doesn't warrant the addition of new syntax, and as you and others have show, user-land libraries can be put together to help with these uses cases that care about this kind of performant handling.

Yeah, but notice that node read streams are not streams of characters but rather streams of chunks. At some point you come face to face with the difference between what the language thinks should be fine and passing a 10x perf cost on to your users.

I don't see how that's a niche concern. It affects all code which wants to consume a stream of data. All of it has to consider chunk boundaries and possible empty chunks. Transformational code over chunk streams would have to emit a chunk stream as well in order to preserve the integrity of the optimization and the composability of operations. I don't know of any library that does this at all. Do you? I've looked, and my impression is that there's a reason it hasn't been written.

If we can agree on the benefits, then we can weigh and mitigate the costs. I'd rather do one at a time though.

I'm not sure I follow the jump from an iteration of elements to a stream of bytes. Most byte stream will and should never be iterated byte by byte in the first place, at least not through a JS iterator.

As for libraries that handle iterating stream of bytes as bytearray chunks, well I believe both WhatWG and Node streams do exactly that, both leveraging async iterators.

Sure transform on such streams need to handle data split across chunk boundaries. I actually wish more helpers existed to virtualize a set of Buffer/UInt8Array as a contiguous one, avoiding copies where possible. There is definitely an argument to be made that the libraries haven't really matured for efficient processing of chunked data.

However as I said, I didn't follow the connection to avoiding asynchronous iterations because some elements may be available synchronously.

Sure transform on such streams need to handle data split across chunk boundaries. I actually wish more helpers existed to virtualize a set of Buffer/UInt8Array as a contiguous one, avoiding copies where possible.

That is why I am here. I'm the person building that! The way to do it is to define operations over iterators so that your code does not know anything about the way characters are organized in memory, and is also effectively ignorant of whether any given value may or may not need to be awaited. I had to invent this exact optimization to actually be able to do it.

Let's try something pretty simple: building a function to determine if two chunk streams contain the same data. In the world I propose the implementation is pretty much textbook:

function streamsEqual(a, b) {
  for await? (const [va, vb] of zip([a, b])) {
    if (va !== vb) return false;
  }
  return true;
}

In the world we have, well, let's see what the best I can do is...

function streamsEqual(a, b) {
  const aPeekr = await peekerate(a);
  const bPeekr = await peekerate(b);
  let aPos = 0;
  let bPos = 0;

  while (!aPeekr.done && !bPeekr.done) {
    if (aPos === aPeekr.value.length) {
      await aPeekr.advance();
      aPos = 0;
    } else if (bPos === bPeekr.value.length) {
      await bPeekr.advance();
      bPos = 0;
    } else {
      if (aPeekr.value[aPos] !== bPeekr.value[bPos]) return false;
      aPos++;
      bPos++;
    }
  }
  return true;
}

Ok, it's not awful as code goes, but that took me twenty minutes of careful thinking to write where the first example was done almost as quickly as I could type it out.

Isn’t β€œzip” the thing that made the first example easier, though?

Sure! I broke the work down into smaller reusable parts because the way the code was written allowed me to.

Look carefully at the second example: it isn't possible to use zip there because you know nothing of the kinds of chunks the streams will contain. One might be chunks of length one, while the other might be chunks of length 65535. You can't do something as naive as a zip because you actually need to ensure that the streams can advance completely independently of each other.

How does the first example avoid that problem?

Let's see, maybe I can write a zip function over chunk streams though. It's a good challenge and will illustrate more of the problems

async function *zip(a, b) {
  const aPeekr = await peekerate(a);
  const bPeekr = await peekerate(b);

  while (!aPeekr.done && !bPeekr.done) {
    if (!aPeekr.value.length) {
      await aPeekr.advance();
    } else if (!bPeekr.value.length) {
      await bPeekr.advance();
    } else {
      const chunkLen = Math.min(aPeekr.index + aValue.length, bPeekr.index + bValue.length);

      yield [aPeekr.value.slice(0, chunkLen), bPeekr.value.slice(0, chunkLen)];

      aPeekr.value = aPeekr.value.slice(chunkLen);
      bPeekr.value = bPeekr.value.slice(chunkLen);
    }
  }
}

My point is not that it's not possible to write this kind of stuff, my point is that the concern over how the data is stored proliferates -- it becomes part of every single aspect of the code.

I'd write all these functions if I thought it was the best thing to do. But I don't see why I'd want to. These functions require input in a particular shape to work, where all the functions implemented with my optimization will work equally well on everything. For example, in my world I'll support asyncEqual('someContent', createReadStream('./file')), where in the chunks world it has to become a concern of the caller: asyncChunksEqual(['someContent'], createReadStream('./file'))

Both examples avoid the problem in the exact same way: by doing as much synchronous work as is possible, and then waiting when they must in order to continue. It's just how they look from the outside and the difficulty of writing them that differs.