Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[qob] The Query Driver should collect partition results as they complete #14607

Open
daniel-goldstein opened this issue Jul 9, 2024 · 0 comments

Comments

@daniel-goldstein
Copy link
Contributor

daniel-goldstein commented Jul 9, 2024

What happened?

Currently, the ServiceBackend's implementation of collect distributed array submits a job group full of worker jobs (1 per partition) and waits for the job group to complete before reading the results of the worker jobs. For small analyses this is fine, but when a query has tens of thousands of partitions it can take time to schedule and complete all of the worker jobs and reading back those results on the driver can become a bottleneck. Below is one possible solution to this problem:

Expose log for job completions in a job group

The Query Driver should attempt to read worker job results while the stage is running, but to do this it needs the Batch API to provide an append-only log of completed jobs in a job group that the Query Driver can consume instead of issuing O(jobs) job status requests during each stage. It may be that this is already possible with the current database schema, but can at worst be achieved by creating an indexed column on jobs that contain the spot they completed in in the job group.

Completion of this feature would require:

  • Carefully evaluating the Batch data model to determine if there are any database changes necessary to construct an append-only log of job completions in a job group from the state of the database
  • If changes are needed, design and implement a batch front end API endpoint to query the log
  • (Separately) Add support for streaming the log in the Scala BatchClient and use it to read partition results before the job group completes.

Version

0.2.132

Relevant log output

No response

@daniel-goldstein daniel-goldstein added enhancement needs-triage A brand new issue that needs triaging. query batch and removed needs-triage A brand new issue that needs triaging. labels Jul 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant