Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Which functions should we expose, and what should they be called? #3

Open
njsmith opened this issue Oct 7, 2018 · 2 comments
Open

Comments

@njsmith
Copy link
Member

njsmith commented Oct 7, 2018

In my initial draft, I have 3 functions:

  • run_on_each: concurrent map, with results optionally directed to a SendChannel, no return value
  • amap: concurrent map, async with calling convention, with results provided as an async iterable
  • run_all: concurrent call-all-these-callables, with results provided at the end as a list

I'm not at all sure that these three are the right set to provide, or that we have the names right.

I guess there's a two-dimensional space of calling conventions:

  • Input handling: fn+iterable (map style) vs. iterable-of-fns (gather-style)
  • Output handling: discard vs. send-on-channel vs. async-with-returning-async-iterable vs. nursery.start-returning-async-iterable vs. big-list-at-end

So in principle there are 2*5 = 10 functions we could provide here... but that's way too many and too confusing, so we need to cut it down somehow.

@oremanj
Copy link
Member

oremanj commented Sep 3, 2020

Hmm.

  • Discard vs send-on-channel can be the same function by taking an optional channel argument (as run_on_each already does).
  • run_on_each could also take optional task_status; if used with start(), it would internally create a channel pair and make start() return the receive end.
  • For fn+iterable versus iterable-of-fns, I think the split you have here makes sense. In practice, people are going to use this with either "a handful of things" or "an unknown large number of things". The "handful" case is much more likely to be running different functions and to not care about getting the results incrementally. It's easy enough to adapt between the two conventions: run_all([partial(fn, arg) for arg in args]) or run_on_each(lambda fn: fn(), fns).

(fun fact: for maximum inscrutability points, that run_all could also be run_all(map(partial(partial, fn), args)))

So I think the three we have now are a good three to be working with. Friendly amendment: maybe run_all can support being invoked with positional *args, so you can say run_all(thunk, thunk, thunk) instead of needing another pair of delimiters. I think the case where all the functions are listed in the code will be pretty common. Could also support both conventions, under the theory that callable iterables are rare.

Naming:

  • run_all is my favorite name of the three, it's very clear and Trio-ish, and it seems obvious to me that it takes a bunch of thunks and returns a list. asyncio calls this gather but I don't think that's as good of a name.
  • run_on_each also seems clear. I don't love that it parallels run_all but has a different result convention, but I can live with that.
  • amap is a little inscrutable compared to the other two, and I look at it and expect something that returns a list (probably due to too much exposure to builtin map in 2.x returning a list). We could borrow asyncio's terminology and call it as_completed, maybe? Or running_on_each or some other gerund-y form to emphasize the context manager aspect.

@dhirschfeld
Copy link
Member

I'll put in a plug for an async as_completed function: groove-x/trio-util#7

IIUC what run_all does but as an async iterator. I don't want to have to wait until all the functions have completed before I can pass the results off to the next step in the pipeline.

@pquentin pquentin mentioned this issue Nov 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants