-
Notifications
You must be signed in to change notification settings - Fork 574
Implement Work Partitioner V2 #17111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Work Partitioner V2 #17111
Conversation
For now I haven't design priority here. Should I? We kinda have prioritization because we'll never query underlying selector if we ever have pending zkapp commands/single spec on hand. But that's not full priority because there's no priority with in between the zkapp commands. EDIT: I'm using a FIFO queue for zkapp command pools. So in theory completion of early arrival commands is prioritized instead of based on completion percentage. This should be fine as long as the expected completion time for each zkapp command is roughly the same. If we instead prioritize based on completion rate, e.g. we sort by the number of jobs left to complete, there's likely some starving issues. Although we only starve across already pending zkapp commands rather than all works in the pool. Since we prioritize any pending zkapp commands to any unpartitioned work. Here's the background thread. And I guess this need input from @georgeee. |
b990aed
to
2d90d4a
Compare
5e80f8a
to
29325da
Compare
ff0f1a9
to
40580a1
Compare
e519fd0
to
5fe965c
Compare
b45b9a6
to
21b678d
Compare
…app_segment_works that's used by both Snark_worker and Work_partitioner to Work_partitioner.Snark_worker_shared
5fe965c
to
08b2b67
Compare
…mp_slot, convert_single_work_from_selector, issue_job_from_partitioner}
…rtitioner, request_partitioned_work}
…a work before falling back to other means
…d, so issue from pending zkapp command should be prioritized over issuing from tmp slot
08b2b67
to
daa37a2
Compare
Thanks @georgeee for drafting this graph during our session :) |
…recuring series of int64, warns on overflow now
This is part of the project Rework Snark Worker, intended to make job issuing from the Snark work coordinator issue finer-grained jobs, so that more parallelism is possible.
This, although is indeed a noop, will affect later PR where we switch the RPC logic completely. If this is buggy, CI will fail on later PR that actually does the switch.
Mechanism
Basics
The new layer is called a
Work_partitioner
. It's in function between aWork_selector
and aSnark_worker
. It's duty is:Small jobs
A "small job" is smaller than the original job provided by a Work_selector, in 2 possibilities:
Design Issue
We don't want to not break other parts of the system, most notably GraphQL APIs. We bolt on a new layer atop the
Work_selector
. Ideally, when there's refactor, we should unite with theWork_selector
.This layer, doesn't take care of the work reissue of a full job in the Work_selector's perspective, this is because they already have mechanism to do so there. So we only redistribute "small jobs".
It's assumed that we don't care the order of zkapp command base snarks / merge snarks being performed.
RPCs
Use of assertion when merging 2 Single work to a One_or_two
In the function
merge_to_one_result_exn
here, we are using assertion. The assumptions should be true unless there's bug in code. Still, this worth some attention.Several failwith when unwrapping errors in work partitioner
See here.