Proposal + seeking champion: Composable promise concurrency management

Repo: GitHub - isiahmeadows/proposal-promise-schedule: A proposal to add composable promise concurrency management

Promise concurrency management is awkward, obtuse, and plainly difficult at times. It gets non-trivial extremely fast, and is severely fraught with pitfalls and edge cases. And that's even before you attempt any optimizations. (For brevity's sake, I'm not including examples here - follow the link and you'll understand why.)

This proposal took several iterations, including significant intermittent experimentation for years on code in the wild before I finally narrowed it down to something like the above. But at this point, I can't really think of an API that fits right in idiomatically with the way we use promises in practice any better. I actually like this API - it feels fun and rewarding to use, something I can't say about a lot of things programming-wise.

2 Likes

This ... is the feature I need, and didn't even know I needed.

There have certainly been times where I've wanted to cap the number of active promises, to prevent sending out a gazillion REST requests or database queries at the same time. Sometimes this occurs in production code, and the amount of ugly (and potentially buggy) code I would have to put in to implement this properly is scary. Sometimes this occurs in a database-upgrade script where I'm not really inclined to do anything too complicated or depend on anything external, so I just throw together a very non-performant solution.

This also provides a very easy solution for the whole sync-errors while setting up promises issues, as discussed at length here (it might not be what the O.P. wanted in that thread, but it'll certainly make me happy).

It's a very creative API too - I love how simple it feels, yet it has so much power.

I'm not 100% sure but isn't Promise.map from bluebirdjs have similar goal?
http://bluebirdjs.com/docs/api/promise.map.html

@MaxGraey I have a link to that in my proposal.

A function that took an iterable of promise-returning functions, combined with a concurrency option, seems like it'd be simpler to me than the schedule callback you've described.

May bad, didn't see this

I get that, but when you need to schedule tasks within tasks, that falls apart. That can happen very easily with sufficiently large amounts of I/O-heavy processing work where not everything is a nice single collection I can just append to, and that's where the reentrancy part of my proposal comes into play. (I've already run into that at work, too.) There's also the issue of if you need to asynchronously push tasks to the queue, you can't just use a plain iterable to do it. And my follow-on of allowing task prioritization and weighting would be much more awkward to add if you used an iterable of tasks - while this doesn't have broad use, it does have some utility in that it's useful for coordinating certain sets of heterogenous tasks.

I do want to note that in terms of library precedent, only two libraries I linked actually follow that pattern of basically a concurrency-limited forEach. when takes a guard-based approach, and d3-queue and vow-queue (linked at the bottom) use an explicit task queue object. Mine is functionally closer to the latter two as it provides a little more power and in effect works almost as a local mini event loop.

1 Like

I think this problem could also be reformulated as two distinct problems, one being the production of an async iterator from events, the other one being the parallel processing of an iterator (as already mentioned by ljharb). I'm thinking of something like:

AsyncIterator.fromProducer(produce => /* call produce(...) multiple times */)
  .map(process)
  .parallel(10)
  .consume(); // actually rather .forEach(nothing), but that looks so weird just to "consume" an iterator

By splitting up these two, they can be composed with the new async iterator helpers:

const accounts = await users.values()
  .map(getAccount)
  .parallel(10)
  .toArray();

The .parallel(max) function would be called on a (possibly synchronous) iterator yielding Promises and would return an async iterator. When consumed, it would then already consume the next max values from the origin, and would yield the value of the first Promise resolving (then consuming the next one).

This for sure is not as thought out as the proposal is, just wanted to throw in a totally different way of thinking about the problem space, which I think is far larger than the problems presented in the proposal (thinking of Streams, time based rate limiting, ...).

2 Likes

I can go with that. FWIW, I'm not seeking much beyond just getting it in the pipeline. Feel free to file an issue with that alternate formulation - I'd love to discuss it a little more at length.

I've added a case study to the README explaining a use case where such a thing would fall apart. Basically, the issue is when one task depends on another. It's possible in theory to hack around it, but it gets extremely awkward and you're almost better off starting from scratch and just not using it.

Edit: Maybe your proposal was less about memory and rate control, and mostly about flattening the results of multiple nested promise result arrays?

I would like something like that in quite a few cases.
Though, I would still rather not wrap every API call in a function. I wonder if the flattening can be proxied.


Do you think we're solving the same problem here? I might have a cleaner format, but my approach is so different, I'm not sure these are at all equivalent. I think they stem from a similar problem.

Does this provide the functionality you need? (Or is it completely different?)

const maxParallelCalls = 10; //control memory usage
const maxCallsPerSecond = 1; //control rate limits
const api = new Throttle( myAPI, maxParallelCalls, maxCallsPerSecond );

//... in some function somewhere
state = await api.someQuery(); //throttling & memory are handled. No worries.

//...sequential stuff just works
myOptions.map( opts => api.someQuery( opts.a ).then( () => api.otherQuery( opts.b ) ) );

//...parallel stuff just works
myItems.map( item =>
  Promise.all( [ api.q( item.x ), api.q( item.y ) ] ).
    then( () => api.q2( item.z ) )
);

//skip the queue for emergencies
await myAPI.query( doThisRightNow ); //can't wait for memory / rates
//(myAPI is the unwrapped version / not throttled)

//need all my item queries to finish:
await api.finish( ( [ kind ] ) => kind === 'image-element' );
//calls array.filter() on query parameters still in the queue
//resolves as soon as the last relevant query finishes

//need all queries to finish:
await api.finish();

It's about concurrency control and operation ordering, and is considerably more primitive than memory and rate control - those can be built on top of this (and are two of the motivating factors), but not vice versa. The follow-ons I've mentioned as possible extension points make a lot more sense in this context, and the more I think about it, those probably also should've actually been included even though I initially intentionally deferred them. (I should've also included a throttler hook so things like yours would fit more cleanly into it.)

Your proposal does a broad subset of mine, but is considerably more magical under the hood were you to implement it, and it also doesn't offer the potential follow-on extensibility mine offers for task prioritization. It also requires the effective equivalent of setTimeout, and even simple things like Promise.delay have been repeatedly rejected by the committee as being better suited to web standards.

1 Like

I thought I was probably doing something different. Just hoping to understand a little better.

This was an example from an implementation. You are correct, it requires setInterval or requestAnimationFrame for a control loop. Task priority / sequence-enforcement work fine, but my approach has other problems.

setTimeout, setInterval, and requestAnimationFrame are problematic in every system I'm working with. Network and animation behaviors need to be spread out over time, but they need to interact with systems that are asynchronous but time-insensitive. It's troublesome.

I need a better way to handle not just concurrency and sequence, but time. Maybe I'm wishing for an over-constrained system? Oh well.

Tip: deconstruct what interacts with what, and don't overabstract. Also, my proposal isn't a silver bullet - it won't help solve all concurrency issues.