Iterator.prototype.findMin, Iterator.prototype.findMax methods

This proposal introduces two new Iterator.prototype helper methods which are meant to be a specialized usage of Array.prototype.reduce:

  • Iterator.prototype.findMin<T>( comparatorFn: ( valueA: T, valueB: T ): -1|0|1): T | undefined - find "the smallest" value within a collection, according to a custom order (defined by a comparator function which is the same as Array.prototype.sort's callback as defined in 23.1.3.30 Array.prototype.sort ( comparefn ) in ES2023 specs)
  • Iterator.prototype.findMax<T>( comparatorFn: ( valueA: T, valueB: T ): -1|0|1): T | undefined - same as above, but "the biggest".

EXAMPLE

function* peopleGen() {
  yield {name: "Adam", age: 15};
  yield {name: "John", age: 40};
  yield {name: "Lisa", age: 23};
}

const result = peopleGen()
  .findMax((a, b) => a.age - b.age);
result.next(); //  {value: {name: "Adam", age: 15}, done: false};
result.next(); //  {value: {name: "John", age: 40}, done: false};
result.next(); //  {value: {name: "John", age: 40}, done: false};

POLYFILL

not sure if the polyfill is available before Iterator.prototype becomes available.

BEFORE

Currently we can achieve it by running the Array.prototype.reduce, where the current accumulator is the currently biggest/smallest item out of entire collection:

var people = [
  {name: "Adam", age: 15},
  {name: "John", age: 40},
  {name: "Lisa", age: 23}
];

var theYoungest = people.reduce((minSoFar, item) => {
      if (minSoFar.age < item.age){ // earlier in order
          return minSoFar
      } else if (minSoFar.age > item.age){ // later in order
          return item
      } else { // equal in order
          return minSoFar
      } 
  });

The solution currently is perfectly available, though there's no abstraction to extract the comparatorFn which is reusable across .findMin, .findMax and .sort array methods.

AFTER

var people = [
  {name: "Adam", age: 15},
  {name: "John", age: 40},
  {name: "Lisa", age: 23}
];
// var iterator = people.values()
var theYoungest = people.values().findMin((a, b) => a.age - b.age); // {name: "Adam", age: 15}
var theOldest = people.values().findMax((a, b) => a.age - b.age); // {name: "John", age: 40}

Unfortunately we've found that adding new methods to Array.prototype is frequently not web-compatible (see e.g. Web Compatibility Issue: Sugar versions v0.9 through v1.3.9 (inclusive) · Issue #37 · tc39/proposal-array-grouping · GitHub), so there is a very high bar for new Array methods (we can't just break the web). So this is unlikely to happen.

EDIT: original proposal provided min and max

Is this only about the names already-polluted-within-the-ecosystem (like old smoosh gate)? If so, different names can be provided, such as findMin and findMax. Maybe these would be even better...

But if it's about browser vendors refusing to extend Array.prototype, Object.findMax/Min, Map.findMax/Min is also an option to me. What happened to array grouping seems like a healthy consensus.

The experience of browsers during the array grouping proposal was that it is not practical to predict which names are already polluted without shipping and breaking sites, which they don't want to do. groupBy ended up being polluted in a way we could potentially have known about, so we fell back to group, and that was also polluted in a different, more subtle way, across several sites. So as a consequence browsers (understandably) don't want to ship any new string-named methods on Array.prototype.

Object.findMin/Map.findMin are not really suitable here because these methods don't produce an object or map. A reasonable alternative would be on Iterator.prototype, which would let you do Iterator.from(array).findMin(comparator).

@bakkot thank you for absolutely great and quick feedback!
I have updated the proposal to rely on Iterator.prototype. Would appreciate if you could give some more feedback. Thank you so much in advance!

After having some more thought, due to Array prototype pollution and its consequences, going in the direction of array.values().someNewMethod() make a lot of sense and is not vulnerable to locks caused by historical reasons. Moreover, array.someNewMethod() is greedy in its nature, whereas array.values().someNewMethod() is lazy pulled. The direction looks awesome, at least to me.

Only remaining feedback is that the "thisArg" parameter is no longer relevant (iterator methods don't have it). Otherwise, sounds reasonable to me. Realistically I suspect it will be a while before anyone has the time to champion it. The committee is working through a few other followup iterator helper methods (zip and concat at the moment), and I think these are probably top-20 but not top-5 followup methods.

@bakkot the irrelevant thisArg has been removed. Again, thank you very much.

Shall I make an official proposal repository with this anywhere? Sorry for a lame question, but it's the first time I'm going through the process.

You can if you want to have a more stable place for the suggestion to live than this thread, but it's not strictly necessary. The next step is to find a TC39 delegate who wants to champion it. I can't volunteer myself; I have a bunch of other stuff I want to work through and can't realistically take on more.

Personally, I don't feel that these helpers are sufficiently motivated. As mentioned in the OP, this is a really simple abstraction over reduce. I don't think it's at all burdensome to just use reduce for these use cases.

I'm realizing that having a .sortBy() function would get us most of the way there. We take an iterator, and we sort it using the callback function. The callback is in charge of mapping each item to a "rank". The "rank" can be anything that's sortable by "<", such as a number or string. In other words, I'm talking about a function that's similar to Python's sorted function, when you use its key parameter.

So, something like this:

var people = [
  {name: "Adam", age: 15},
  {name: "John", age: 40},
  {name: "Lisa", age: 23}
];

// Find min
var theYoungest = people.values().sorted((a, b) => b.age - a.age).take(1)[0]; // {name: "Adam", age: 15}
// Find max
var theOldest = people.values().sorted((a, b) => a.age - b.age).take(1)[0]; // {name: "John", age: 40}

(Anytime I use JavaScript's array.sort() function, I find myself always missing the conciseness of Python's sorted() - it's really nice to just use a one-line lambda to describe how you want to sort something, as opposed to a three-pronged conditional)

Thinking further - if we had a proper .sortBy() function of some sort, maybe people would actually start using that to shuffle an array as opposed to the buggy array.sort(() => Math.random() > 0.5)) that I've unfortunetally seen get used way too often. (i.e. instead it could be array.values().sortBy(() => Math.random()))

how can you sort an iterator?

A "sorted()" function was listed in the iterator helpers proposal as prior art that both Python and npm itertools had. Having a sort function for an iterator would require consuming the entire iterator up-front, but it doesn't require you to do the entire sort up-front - only enough to yield each item that's needed on-demand.

So, doing an iterator.sort().take(1) on an iterator that yields n items would be an O(n) operation, as opposed to the usual - what is it, O(n log(n))?

Honestly I don't think sort is a good alternative here, as you need to sort the entire collection just to get the maximum/minimum element. This is so much unnecessary work.

1 Like

It's not actually any more work. Since it's an iterator method and it returns an iterator, it can be implemented in a lazy fashion. If after sorting you do "take(1)", it only has to do enough sorting to figure out what the first item would be. It doesn't have to figure out the sorted position of any more entries unless you do further "take()"s. Hence why I was saying it is O(n) if used with a "take(1)". Just like how the proposed min/max functions are O(n)

1 Like

That doesn’t make any sense to me; if the item that should sort first is at the end of the iterator, i shouldn’t be able to see any items until i see that last one.

It doesn't make sense to me either.

A take from a different perspective:

  • .sort().take(1) it could be treated as "all sorted so far"
  • but that would have to store all items, sorted, as the iterator progresses - and this is the additional work.
    How can you sort lazily without storing the data?

Ok - it depends on what we mean by "work".

You have to consume the entire iterator and store it in memory for a sort to work. So, it takes more memory. I was interpreting "work" as "number of operations", and they both would have the same big-O if you're taking just one item.

If what I'm proposing is still not clear, perhaps let me provide an example implementation:

/** Swaps array[index1] with array[index2] if the former is smaller than the latter. */
function sortPairInPlace(array, index1, index2, deriveRank) {
  if (deriveRank(array[index1]) < deriveRank(array[index2])) {
    [array[index1], array[index2]] = [array[index2], array[index1]];
  }
}

const Iterator = Object.getPrototypeOf(Object.getPrototypeOf([].values()))

// A smarter implementation could cache responses of deriveRank()
// so we don't have to do repeated calls.
Iterator.sortBy = function * (deriveRank) {
  // Consume the iterator and store its contents in the "items" array
  // Looks for the smallest item as it does so.
  const items = [];
  while (true) {
    const { value, done } = this.next();
    if (done) break;
    items.push(value);
    if (items.length >= 2) {
      sortPairInPlace(items, items.length - 2, items.length - 1, deriveRank);
    }
  }

  if (items.length === 0) {
    return;
  }
  yield items.pop();

  // Continues to lazily sort and yield results as needed.
  while (items.length >= 2) {
    for (let i = 0; i < items.length - 1; i++) {
      sortPairInPlace(items, i, i + 1, deriveRank);
    }
    yield items.pop();
  }

  yield items.pop();
}

// A quick-and-dirty polyfill for take()
Iterator.take = function (count) {
  const items = [];
  for (let i = 0; i < count; i++) {
    const { value, done } = this.next();
    if (done) break;
    items.push(value);
  }
  return items;
};

const data = [
  'abc',
  'abcdefghi',
  'a',
  'abcde',
];

// Logs out 'a'
console.log(data.values().sortBy(string => string.length).take(1));
// Logs out 'abcdefghi'
console.log(data.values().sortBy(string => -string.length).take(1));

So, no, you don't have to pre-sort the entire array to make this work. You can sort as you .take(), lazily.

This is why it's important to mention both time and space when talking about big O. Two things bring O(n) time are quite different if one is O(1) space and the other is O(n).

FWIW that implementation of sortBy looks like O(n) time only in the best case of .take(1), if the iterator was fully consumed it would be O(n^2) as always starts the bubbling from the start of the remaining items.

1 Like

This is true, I don't know why I was thinking the overall big-o would be the same if the whole thing got consumed. Though there's still ways to fix that - e.g. if I stored the consumed iterator in a heap instead of an array, you would get O(n log n) execution time if you consume the whole iterator (a.k.a. a heap sort).

But it does still require the memory overhead. And even though it has a comparable big-o, it is still doing a lot of overhead compared to simply searching an array for the max value.

1 Like