Proposal: {Map,Set}.prototype.randomEntry()

ethanresnick · September 9, 2023, 11:44pm

There are many algorithms that involve getting a random entry from a Map (or Set, or plain object, for that matter), but this is hard to implement efficiently in user-land, because user-land code doesn't have access to the underlying buckets that (I assume) back the aforementioned hash-based objects. Therefore, having the engine implement these natively would be nice.

In the case of Map and Set, these could probably be added on the prototype in a web-compatible way. There could maybe be randomKey and randomValue methods as well.

For plain objects, these would likely need to be static methods on Object.

lightmare · September 10, 2023, 8:17am

Define efficiently. Depending on the underlying hash table implementation, finding a random element (I assume you'd sample 0 <= i < size from a uniform distribution, then find the i-th element) could devolve into a linear scan of a large part of the table. Granted that may be an extreme case, but ES doesn't dictate which data structure Map/Set should be implemented with, only that they must have sub-linear insertion/deletion.

The way I'd go about this, is pair the Map/Set with an array of keys/values. You'd pay O(n) space + O(1) time for maintaining the array, and get O(1) sampling in return. Doesn't sound too inefficient to me.

blob · September 10, 2023, 7:36pm

If this is implementable efficiently in "backend-land", we would make much better use of a .getNthEntry(n: integer), .setNthEntry(n: integer[, k: K], v: V) and .deleteNthEntry(n: integer)

ethanresnick · September 10, 2023, 7:49pm

I think you’re right that there are some ‘extreme’ cases where the engine might need to do a partial linear scan. Or, from what I’m seeing in this article about how Map is typically implemented, the sampling might be O(n) where n is the number of removed elements since the last rehash. Either way, though, the engine seems to be pretty well positioned to offer this with quite good average performance.

In terms of implementing this in user-land with a separate array of keys, in addition to the extra space overhead, I don’t think a single array is enough to handle removals in constant time. There are ways to implement constant time maintenance of some secondary key set though (eg, one approach I saw), so I’m not saying this is impossible to implement in user-land; just that it would be nice, and likely more efficient, to have it built-in.

But I suppose that really raises a larger question about what test TC-39 applies to decide if something is ‘useful enough’ to justify std library inclusion. Are there any criteria on that published anywhere?

ethanresnick · September 10, 2023, 7:49pm

Ooh, I like that even better!

Topic		Replies	Views
Object.at(), Object.size() 💡 Ideas	24	757	October 25, 2023
Proposal for additional array methods 💡 Ideas proposal	3	546	February 14, 2022
Better Randomization Support in JS 💡 Ideas	4	256	November 6, 2022
Custom hash and equal functions for Map and Set collections 💡 Ideas proposal	20	2752	April 30, 2021
Extra `key` option for maps and sets 💡 Ideas proposal	2	320	August 28, 2020

Proposal: {Map,Set}.prototype.randomEntry()

Related topics