Take 2: generator.prototype[Symbol.mixedIterator]

conartist6 · April 22, 2022, 4:30am

One of the major criticisms I've encountered so far is of my focus on speed in the abstract without evidence that there's a sufficiently broad need for the perf characteristics I'm talking about -- that perhaps these needs are "niche" because they mostly apply to streaming parsers. I want to very specifically rebut that claim. Code is probably the most common data type most of us here interact with, and also one of the most valuable. Streaming parsers have the potential to have a significant impact on the way we are able to interact with code because they don't just perform a bit faster or slower, they perform differently. Here's how:

Streaming parsers are significantly more memory efficient as they allow you to avoid ever having to load the entire input into memory. This also enables them to run better in memory-constrained environments. It also allows them to run on infinite streams of data. Imagine being able to build a chat that effortlessly streams data in the exact format it stores it: a JSON array of messages.

Even when they are slower than traditional parsers streaming parsers may still be more responsive to humans due to the advantages of concurrency. This was understood by web browsers, which implemented streaming parsing as a core technology.

Streaming parsers allow unnecessary work to be skipped. For example the top of a file could be read and parsed for docblocks and imports without needing to read the body of a file from disk. The building (and garbage collecting) of AST nodes can also be skipped when in is not needed, such as in syntax-aware code searches (!!). The ability to skip code may have perf impact even beyond the cost of code not executed as storage and processor caches can be more effective.

Note that while the creation of streaming parsers is already possible, I'm specifically thinking of imperative parsers, which are some of the easiest to write due to the ease of understanding and debugging their behavior. This is due to the purely top-down control which tends to create the most relevant and succinct call stacks.

To sum up: I think this shift is coming, and not just in the abstract. I'm working full time to build the first generation of these tools, because I'm pretty sure the market will reward me for being able to create a variety of powerful and performant tools that can only be created when you have a comfortable, powerful, and reliable way of expressing streaming parsers with code.

[Sorry for completely rewriting this in edits. It was hard to keep it organized and focused.]

conartist6 · April 22, 2022, 1:35pm

conartist6 · February 10, 2024, 1:30pm

Sorry to bump a topic, but I'm ready to talk about this again. Is anyone willing to be a champion?

I've gone and done the work I promised to do: I built an ecosystem of tools for streaming text processing. I can now do streaming regex evaluation, and extensible streaming parsing of virtually any programming language.

By implementing a polyfill of mixedIterator in iter-tools, @iter-tools/regex, and bablr, I can allow practical usage of the proposed mixed iterator protocol in order to generate much broader demand for standardization.

conartist6 · April 9, 2024, 2:23pm

I finally actually need this as of today, so I'm going to go ahead and define Symbol.for('@@mixedIterator') as the de-facto standard for people who need a working implementation of this protocol.

conartist6 · April 9, 2024, 2:42pm

It seems to me that there was a fundamental assumption made that iterators would never be fast enough for streaming text processing -- that future processing would be done over async iterators of strings.

I understand why that seemed like a reasonable bet at the time, but textual stream processing is arriving now and the primitive is a character stream. A character stream is the only protocol which abstracts away the differences between strings and file streams, and thus is the only protocol which is reasonable to define a streaming parser on top of.

conartist6 · April 19, 2024, 5:52pm

I'm changing the name to streamIterator as "mixed" isn't sufficiently descriptive -- it doesn't say anything about what you have mixture of. "stream" is technical term in Node meaning an async iterator of chunks, and indeed that is exactly the data structure I will emulate.

Topic		Replies	Views
syncAndAsyncIterator 💡 Ideas	24	1295	April 17, 2022
Code point iterators for strings? 💡 Ideas	3	399	January 24, 2020
Unordered async iterator syntax 💡 Ideas	3	49	December 26, 2024
Streaming regexp support 💡 Ideas	1	281	November 22, 2020
Proposal - generator literals 💡 Ideas proposal , iterator-helpers	21	1001	July 12, 2021

Take 2: generator.prototype[Symbol.mixedIterator]

Related topics