Why map.values() not map.values

typeof obj.next === 'function' is a pretty good check tho, and adding in && typeof obj[Symbol.iterator] === 'function' will be even more robust for all builtin iterators.

I do agree that there's no easy way to determine if a function (that produces an iterator) is going to produce a reusable one or not. For builtin iterators, obj === obj[Symbol.iterator]() would work, but this is of course not robust.

However, this seems like the same problem as not knowing what kind of arguments a function takes - sometimes you just have to read the documentation.

1 Like

Didn't know that library. So you basically want to maintain the multi-pass ability whenever possible, and are asking why map.values() doesn't, when it easily could have. I don't know the answer.

You cite Python's itertools as inspiration, but Python's itertools (and generator functions in general, including builtins like map, filter) all return single-pass iterators.

>>> from itertools import cycle, takewhile
>>> it = cycle(range(1, 21))
>>> list(takewhile(lambda x: x%5, it))
[1, 2, 3, 4]
>>> list(takewhile(lambda x: x%5, it))
[6, 7, 8, 9]
>>> list(takewhile(lambda x: x%5, it))
[11, 12, 13, 14]
>>> list(takewhile(lambda x: x%5, it))
[16, 17, 18, 19]
>>> list(takewhile(lambda x: x%5, it))
[1, 2, 3, 4]

Contrast with:

> const { cycle, range, takeWhile, toArray } = require('iter-tools');
undefined
> it = cycle(range(1, 21))
IterableIterator {
  [Symbol(_)]: {
    fn: [Function: __cycle],
    args: [ [IterableIterator] ],
    staticIterator: null
  }
}
> toArray(takeWhile(x=>x%5, it));
[ 1, 2, 3, 4 ]
> toArray(takeWhile(x=>x%5, it));
[ 1, 2, 3, 4 ]
> toArray(takeWhile(x=>x%5, it));
[ 1, 2, 3, 4 ]
> toArray(takeWhile(x=>x%5, it));
[ 1, 2, 3, 4 ]
> it = (function*() { for (let i = 0; i = i % 20 + 1; ) yield i; })();
Object [Generator] {}
> toArray(takeWhile(x=>x%5, it));
[ 1, 2, 3, 4 ]
> toArray(takeWhile(x=>x%5, it));
[]
> toArray(takeWhile(x=>x%5, it));
[]
> toArray(takeWhile(x=>x%5, it));
[]

If this design works for you, great. I don't find it compelling. Simply changing the source from single-pass to multi-pass iterable (with the exact same elements) gives completely different results (and the single-pass result really surprised me, as it somehow managed to exhaust an infinite generator).

I agree, that's why I don't mind so much the current system where you sort of just have to know. I have a specific problem with the current situation, which is that if you want to use the values of a map as a stateless iterator (which is I think not unreasonable) you have to get hacky and do something like { [Symbol.iterator]: () => map.values() }. That seems to me like a missing stair thing: you just have to know that the right spots in the code to spit out that magic incantation (or a library version of it), and if you don't you get hit with a heavy penalty because it's going to take you a long time to trace back from where the program first displayed incorrect behavior to the root cause of the problem far away in the code. In general I've been very happy to see things evolving in a direction that has fewer gotchas, especially around common tasks like storing key-value maps.

I definitely agree that that result is surprising! I wish it were an error. But I didn't try to make up my own semantics, I tried to understand and follow the semantics laid down by TC39 to the best of my ability.

In particular I think the reason you exhaust an infinite generator is that iterators are destructable resources, which is to say that they may have a return method. Calling return() is expected to make all future calls to next() return { done: true }, and gives the iterator a chance to dispose of any resources it held. return() is called by constructs like for .. of loops when they terminate iteration early, such as a via a break or return. Thus the iterator has a chance to disposes of any resources it holds, like file handles. In your case terminating the generator (just like using a return;) is not what you intend, but I still think the problem isn't that behavior but the lack of an error.

1 Like

Maybe this would be a use case for arrow generators?

*() => { yield* map.values() }

Arrow generators looks nice, but they don't solve anything. Generators you may recall can only be evaluated once.

Could you accept callables and then internally wrap them in an iterable?

() => map.values()

Its not as nice as passing in values() directly, but it's a lot nicer than having to create the iterable manually.

I should say that I do already have a solution to the problem. I developed it before I even know that it would be needed for this purpose:

import { wrapValues } from 'iter-tools';

const values = wrapValues(new Map([[1, 1]]));
[...values]; // [1]
[...values]; // [1]

It's about as concise as you're gonna get, and wrapValues only cares that there's some callable values method.

Expanding the definition of an iterable to include every callable would be throwing the baby out with the bathwater. There's a heck of a lot of functions out there, and most of them cannot be assumed to return iterables. And you have to call the function to find out what it returns! If it wasn't an iterable, you better hope it didn't have any side effects...

So yeah, I guess I'd be happy enough if this is just a reason that iter-tools becomes as omnipresent as lodash (which is my explicit goal), but fundamentally this is a language-level problem and I'd prefer to see the language address it if possible.

I was thinking more that if you are passed a generator function, then that can be detected and can know that the function must be called to retrieve an iterator.

const isGenerator = f => f?.[Symbol.toStringTag] === 'GeneratorFunction';

That works. Do you have a more generic version for any arbitrary iterator-producing method? If so, does it handle arguments as well? Strings for example, themselves iterable, also have a matchAll() method which produces a single use iterable iterator. matchAll() also accepts a regexp argument.

I guess without a more generic handler, it wouldn't be too hard to

wrapValues({ values: () => str.matchAll(regexp) })

I also have wrapKeys and wrapEntries.

matchAll is interesting. I don't have much problem with matchAll because I think you'd expect a single-pass iterator there. After all, the method is doing expensive work, and there isn't much point to applying the same regex to the same input string more than once: the results will be the same, which is fundamentally different than an iterator over a mutable data structure.

Imma bikeshed some of my own ideas since I keep shooting down other people's.

The heart of the problem is distinguishing iterators from iterables.

The most obvious solution is trying to detect [Symbol.iterator]() { return this; }, e.g. as this === this[Symbol.iterator](). If that condition is true, we certainly have a generator or other native iterator. But if it's false, we have not learned much. We could just have an expression like map(_ => _, new Map().values()): a single pass iterator dressed up in a multi-pass one. There's another problem with this solution, which is that creating iterators may have side effects. Creating side effects in code that should be pure is a deal-breaker in my book.

A better option might be to flag iterators with their iterable, sort of like we flag instances with their constructor. Thus we could check if iterator[Symbol.iterable] === iterator, which should be true when [Symbol.iterator]() { return this; }. Symbol.iterable would have to go on the instance, not the prototype, which is good because that means it could also be computed dynamically, e.g. the implementation of map in iter-tools could set Symbol.iterable based on its value in the source iterable. Then you could write a snippet like this:

tryGetFreshIterator(o) {
  if (o == null || typeof o !== 'object') throw new Error('o is not iterable');

  let iterable;
  if (o[Symbol.iterable]) {
    iterable = o[Symbol.iterable];
    if (iterable === o) throw new Error('o cannot make fresh iterators');
  } else {
    iterable = o;
  }

  return iterable[Symbol.iterator]();
}

Thus tryGetFreshIterator(map.values()) would fail, but tryGetFreshIterator(map) would succeed, and so too could tryGetFreshIterator(map[Symbol.iterator]()).

I think there's a lot that's attractive about the Symbol.iterable solution. While it allows you to write something like tryGetFreshIterator, you certainly don't have to. You can make the tradeoff for speed if you're confident that you aren't going to run into problems.

It's also easy enough to polyfill I'd think.

The most major downside I can think of is that it tends to break encapsulation. Iterators are frequently used as a readonly facade for an underlying data structure, and the Symbol.iterable property, if present, would offer code a way to break that facade. Would it be possible to design the API a little differently to guard against this I wonder? You could imagine something like Symbol.iteratorFactory which returned the actual definition of the Symbol.iterator function, i.e. it would be used as iterator[Symbol.iteratorFactory](). Such a syntax is bound to have problems with this binding so let's take a look. Are they solvable, particularly while maintaining encapsulation? I think you'd have to have something like this:

class MyArrayIterator {
  constructor(iterable) {
    this.#array = array;
    this.#i = 0;
  };
  get [Symbol.iteratorFactory]() {
    return this.#array[Symbol.iterator].bind(this.#array);
  }
  [Symbol.iterator]() {
    return this;
  }
  next() {
    return { value: this.#array[this.#i++], done: this.#i === this.#array.length };
  }
}

Doesn't look too outlandish to me?

Of course maybe I'm overthinking this! By far the simplest solution to the problem would be to just implementing iterators a little differently. Imagine we wrote this:

class BetterIterator {
  constructor(iterator) {
    this.iterator = iterator;
    this.done = false;
  }

  next() {
    if (this.done) throw new IteratorDoneError('You must not call next after an iterator is done');
    const step = this.iterator.next();
    if (step.done) this.done = true;
    return step;
  }
}

The downside of this is that while it can catch a lot of errors, this behavior is generally understood to be fine at the moment. People might even wrap the wrapped iterable to get the old behavior back, negating the advantage. Plus shared-state race conditions would persist quietly all the way up until one consumer exhausted the source, at which point the other consumers would error out. OK, yeah, not liking this option as much now.

I don't think you can know everything, but I think best guesses give you this

isIterable = target => typeof target?.[Symbol.iterator] === 'function'
isIterator = target => typeof target?.next === 'function'
isSinglePass = target => isIterable(target) && isIterator(target)
isMultiPass = target => isIterable(target) && !isIterator(target)

The fact that being an iterator inherently makes you single pass is not ideal, but since iterators themselves, as iterators, can be exhausted, so should they be as iterables.

This is a problem I think that could be solved (maybe?) with something similar to your concept of the iteratorFactory. I'm not entirely sure where iteratorFactory would be used, but I could see that instead manifesting itself as another optional method of the Iterator interface:

interface Iterator {
    next(value?:any):IteratorResult;
    return?(value?:any):IteratorResult;
    throw?(exception?:any):IteratorResult;
    create?():Iterator; // <- new
}

where here, the optional create() would recreate a new fresh iterator instance that would be a copy of a new version of this iterator.

Then, something like map.values() could return an iterator with

class MapIterator {
    #map // instance that create the iterator
    #method // method that created the iterator (e.g. values())
    // ...
    create () {
        return this.#method.call(this.#map) // map.values()
    }
}

Having an iterator and wanting to iterate over its value multiple times would then be a matter of calling create, if it exists.

nextIterator = iterator.create?.()

create() in this sense would be similar to a generator object referring back to and calling its generator function. In fact generators would easily be able to implement this as such

function * genFn () {}
genFn.prototype.create = genFn // (automatic?)
genObj = genFn()
genObj.create() // another genObj

Here, notably with custom generators, there could be a problem around how arguments are handled. Presumably you'd want to cache the original generator arguments and throw them back into the generator function call upon invoking create(). Though this is not something your typical keys/values/entries methods would have to deal with.

This way the concept of isMultiPass can change to include iterators assuming they implement create(). You'd just have to know to call it to get new iterators for additional passes. And, really, in the end this is nothing much more than an exposed [Symbol.iterator]() but a manual version that clones instead of applying to the iterator itself.

1 Like

TL;DR: how valuable is the ability to get a fresh iterator from the iterable returned from map/filter/etc? Depends on what I want to do with it:

  • single-pass — not used
  • multi-pass over non-mutating source — fresh-iterator in each loop means duplicating work
  • multi-pass over mutating source — fresh-iterator in each loop is needed, but should this be the default behaviour?

Let's say I'm writing a function that takes an iterable and needs to transform and scan it twice. Silly example: split a list of strings into two lists:

function splitByLength(strings: Iterable<string>) {
  let trimmed = map(s => s.trim(), strings);
  let totalLen = 0;
  let count = 0;
  for (let s of trimmed) {
    totalLen += s.length;
    count++;
  }
  let avgLen = totalLen / count;
  let shorterThanAverage = [];
  let longerThanAverage = [];
  for (let s of trimmed) {
    if (s.length < avgLen) shorterThanAverage.push(s);
    else longerThanAverage.push(s);
  }
  return { shorterThanAverage, longerThanAverage };
}

The problem with this function is that if the input iterable is single-pass, then trimmed will be single-pass as well, and it won't work. In order to ensure I can scan it twice, I would have to use:

let trimmed = Array.from(strings, s => s.trim());

If I know / require the caller passed a multi-pass iterable, then the original function will work, but it will work too much. Each loop will re-do the mapping operation, which is rarely desirable. In other words, I would probably still want to store the trimmed strings in an array, rather than have them trimmed twice.

And then there's the use case you're aiming for — where the base iterable is a collection that's expected to mutate between loops, and you want the transformed iterable (map/filter/etc) to restart the transformation each time. Having this be the default behaviour seems weird to me. Normally I would wrap the transform in a function, calling that function then imo better signals that a new transformer/iterator is being created:

const trimmed = () => map(s => s.trim(), strings);
for (let s of trimmed()) ...

// or separate the lazy operation (trim) from the subject (strings)
const trim = iter => map(s => s.trim(), iter);
for (let s of trim(strings)) ...

With this approach, there's no need for map to ever return a multi-pass iterator. It's up to the user to wrap it in a "restarting" function when they need this behaviour.

A couple thoughts:

First, I never check whether Symbol.iterator is a function. A simple truthiness check will pass by null and undefined values, and if Symbol.iterator is not callable, the correct error to throw is the one thrown by trying to call it anyway.

Second, your implementation of multi-pass detection would label every result returned from iter-tools as single-pass, though they are all multi-pass. In my implementation a static iterator is created lazily if the IterableIterator.next() method is called. I wonder how many other people are really doing that?

I like the idea of iterator.create, but I think the details may not be favorable. I am still particularly opposed to trying to detect iterators from the presence of a next method. iter-tools the package originally did this and I ripped it out. next is just too common of a name for a method: any code which makes critical decisions based on the presence of that method is going to have some weird weird edge cases, and will end up being unsuitable for metaprogramming, that is to say treating parts of programs as data, something which is not uncommon for libraries and frameworks.

Also you still have the problem that you need an iterator to get a new iterator, which may result in the unnecessary creation of a real iterator which may realistically cause side effects like opening files.

Yes, I would only consider your example code to be technically correct if you stored the values from the iterable before using them twice. Not only could the iterable be single-pass, it could be a multi-pass iterable over mutable data causing your algorithm to violate its design assumptions and return incorrect results. I'm not worried about that though because it's the same problem no matter what, and it's a completely localized syntactic pattern that type-aware linters can warn against.

Usually iterables are single-pass. 99% of them are only used once. iter-tools even makes sure never to assume its input iterables are multi-pass. But this doesn't mean multi-pass iterables aren't valuable. They're easily worth their weight when debugging, even if they duplicate some calculations. Also they're a powerful feature for doing stream transformation on mutable data, a common task in UI programming.

I do plan to have the next version of iter-tools offer a "restarting" function for generators that caches arguments I hope it won't be needed often though, and certainly not as a necessary shim for using core data structures.