-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: optimize segment proof aggregation #410
feat: optimize segment proof aggregation #410
Conversation
98c4750
to
f01714a
Compare
f497f6e
to
daf0e59
Compare
PR limits the number of the segment proofs that are proven in parallel to limit the memory consumption with the small penalty of async engine utilization. Chunk size could be tweaked if needed. |
c34a6da
to
1a44121
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid this doesn't quite solve the problem.
The current "issue" causing large memory overhead with continuations, is that we want to process a large batch of txns at once, to reduce redundant work in between (re-calculations of intermediate trie roots).
Note that I admit the names here not being changed are confusing. When we iterate over txs
, we don't have individual txn
that are passed as inputs of the SegmentDataIterator
but a Vec
of txns, of length possibly greater than 1 given the above.
The most CPU cycle optimized approach to reduce the mentioned redundant computations is to have a single payload representing all txns of a block (by payload here, I mean the GenerationInputs
fed to the prover).
As such, the below loop can be simplified at a high-level as:
let tx_proof_futs: FuturesUnordered<_> = {
let data_iterator = SegmentDataIterator {
partial_next_data: None,
inputs: single_payload, // <-- all txns are contained here!
max_cpu_len_log: Some(max_cpu_len_log),
};
Directive::map(IndexedStream::from(data_iterator), &seg_ops)
.fold(&agg_ops)
.run(runtime)
.map(move |e| {
e.map(|p| (idx, proof_gen::proof_types::TxnAggregatableProof::from(p)))
})
};
As such, the problematic iterator here is data_iterator
which is running unboundedly, generating possibly hundreds of segments that stay in memory until being consumed (because generating them is orders of magnitude faster than proving them).
At the moment, Original implementation was keeping all the |
Put it differently, think that we now have a single We still have the outer loop iterating over |
Ok, I had wrong understanding of the problem. I'll look how to configurable limit |
* feat: reorganize cli params * fix: conficts * fix: review * fix: review 2
Returned it to draft, CI tests seem pretty slow (maybe there is some other reason), but I need to test this PR more with the real prover tomorrow. |
066d761
to
134cbe9
Compare
542d5f3
to
8060288
Compare
eaaf614
to
5b92df1
Compare
db4367f
to
e45be0b
Compare
…timize-segment-proof-aggregation
56911c7
to
8e59e1c
Compare
PR is tested. Currently we have fixed parameters for cpu len and batch size of 10, where premature segment chunk aggregation does not make a big improvement in memory used, but takes significant CPU performance hit. We give up this approach. |
Resolves #399
Resolves #403