Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Handle arbitrarily long lists of inputs. #44

Open
coltonbh opened this issue Sep 2, 2023 · 0 comments
Open

[FEATURE] Handle arbitrarily long lists of inputs. #44

coltonbh opened this issue Sep 2, 2023 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@coltonbh
Copy link
Collaborator

coltonbh commented Sep 2, 2023

Multi-thread (or use asyncio) the submission and retror array

Use something like this:

import concurrent.futures

def compute(...):
     ...

with concurrent.futures.ThreadPoolExecutor() as executor:
    results = list(executor.map(compute, inputs))

Is there really any reason to even use Celery groups If I implement this? Using them can reduce the number of HTTP calls quite dramatically, but if I multi-thread the submission and collection of results--perhaps the performance gains aren't that big of a deal and then I have a single level of abstraction to worry about--multi-threading submission and collection of results--instead of nesting groups within that multi-threading.

Pros: I'd never get timeout issues when collecting results because the group ended up being so large.

Cons: Collection of results will be slower, as it IS faster to collect 100 results (current default) in a single request.

EDIT: Chatted with Ethan. He has special code to handle group size issues when a group has too much data to download before timing out. This indicates to me that the batch size thing should be dispensed with entirely. Better to get your data with a few seconds of delay on a large batch submission than to have your results held hostage on the server because the group is too large to be downloaded and you have to resubmit with a smaller batch size in order to get your results back.

For FutureResultGroup we could use concurrent.futures.as_completed(...) to collect results as they complete ("stream" results):

for result in future_result.as_completed():
    # do something with result

Could also have in order as-completed:

for result in future_result.collect() # or some better method name
    # do something with result
@coltonbh coltonbh added the enhancement New feature or request label Sep 2, 2023
@coltonbh coltonbh self-assigned this Sep 2, 2023
@coltonbh coltonbh changed the title Autopartition tasks by max_batch_inputs on server [FEATURE] Handle arbitrarily long lists of inputs. Jun 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant