[qob] The Query Driver should collect partition results as they complete #14607

daniel-goldstein · 2024-07-09T18:19:32Z

What happened?

Currently, the ServiceBackend's implementation of collect distributed array submits a job group full of worker jobs (1 per partition) and waits for the job group to complete before reading the results of the worker jobs. For small analyses this is fine, but when a query has tens of thousands of partitions it can take time to schedule and complete all of the worker jobs and reading back those results on the driver can become a bottleneck. Below is one possible solution to this problem:

Expose log for job completions in a job group

The Query Driver should attempt to read worker job results while the stage is running, but to do this it needs the Batch API to provide an append-only log of completed jobs in a job group that the Query Driver can consume instead of issuing O(jobs) job status requests during each stage. It may be that this is already possible with the current database schema, but can at worst be achieved by creating an indexed column on jobs that contain the spot they completed in in the job group.

Completion of this feature would require:

Carefully evaluating the Batch data model to determine if there are any database changes necessary to construct an append-only log of job completions in a job group from the state of the database
If changes are needed, design and implement a batch front end API endpoint to query the log
(Separately) Add support for streaming the log in the Scala BatchClient and use it to read partition results before the job group completes.

Version

0.2.132

Relevant log output

No response

The text was updated successfully, but these errors were encountered:

daniel-goldstein added enhancement needs-triage A brand new issue that needs triaging. query batch and removed needs-triage A brand new issue that needs triaging. labels Jul 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[qob] The Query Driver should collect partition results as they complete #14607

[qob] The Query Driver should collect partition results as they complete #14607

daniel-goldstein commented Jul 9, 2024 •

edited

Loading

[qob] The Query Driver should collect partition results as they complete #14607

[qob] The Query Driver should collect partition results as they complete #14607

Comments

daniel-goldstein commented Jul 9, 2024 • edited Loading

What happened?

Expose log for job completions in a job group

Version

Relevant log output

daniel-goldstein commented Jul 9, 2024 •

edited

Loading