feat: optimize segment proof aggregation #410

atanmarko · 2024-07-19T14:13:44Z

Resolves #399
Resolves #403

atanmarko · 2024-07-19T14:53:04Z

PR limits the number of the segment proofs that are proven in parallel to limit the memory consumption with the small penalty of async engine utilization. Chunk size could be tweaked if needed.

Nashtare

I'm afraid this doesn't quite solve the problem.
The current "issue" causing large memory overhead with continuations, is that we want to process a large batch of txns at once, to reduce redundant work in between (re-calculations of intermediate trie roots).
Note that I admit the names here not being changed are confusing. When we iterate over txs, we don't have individual txn that are passed as inputs of the SegmentDataIterator but a Vec of txns, of length possibly greater than 1 given the above.

The most CPU cycle optimized approach to reduce the mentioned redundant computations is to have a single payload representing all txns of a block (by payload here, I mean the GenerationInputs fed to the prover).

As such, the below loop can be simplified at a high-level as:

            let tx_proof_futs: FuturesUnordered<_> = {
                let data_iterator = SegmentDataIterator {
                    partial_next_data: None,
                    inputs: single_payload, // <-- all txns are contained here!
                    max_cpu_len_log: Some(max_cpu_len_log),
                };

                Directive::map(IndexedStream::from(data_iterator), &seg_ops)
                    .fold(&agg_ops)
                    .run(runtime)
                    .map(move |e| {
                        e.map(|p| (idx, proof_gen::proof_types::TxnAggregatableProof::from(p)))
                    })
            };

As such, the problematic iterator here is data_iterator which is running unboundedly, generating possibly hundreds of segments that stay in memory until being consumed (because generating them is orders of magnitude faster than proving them).

atanmarko · 2024-07-21T11:54:47Z

At the moment, generate_next_segment is executed on GENERATION_INPUTS_AGG_CHUNK_SIZE_INNER GenerationInputs. 4 such jobs (so 16 GenerationInputs segments with SegmentProof proving) would run in parallel, and then be aggregated to chunk transaction proof.
Eventually all the chunk resulting transaction proofs would be aggregated to final transaction proof. Chunk sizes are there to limit memory consumption (we will keep in memory input GenerationSegmentData for GENERATION_INPUTS_AGG_CHUNK_SIZE GenerationInputs proving jobs). If that is not what we want, what is the alternative?

Original implementation was keeping all the GenerationSegmentData and executed first all the SegmentProof jobs before it switch to the proof aggregation (e.g.if block has 200 GenerationInpus then one data_iterator and SegmentProof for all segments in the block running in parallel), because paladin does not switch to other operation in chained operation setup before it finishes the previous one.

Nashtare · 2024-07-21T12:12:52Z

if block has 200 GenerationInpus then one data_iterator and SegmentProof for all segments in the block running in parallel

Put it differently, think that we now have a single GenerationInputs representing 200 txns. The issue is the internal SegmentDataIterator. There is only of them, but it is responsible for generating all segments, which cannot be processed as fast as they are being generated.

We still have the outer loop iterating over txns because we want the flexibility here. This is configured by the -b batch size argument when calling the proving mode. But theoretically, the best is when the batch covers a whole block at once, effectively removing the outer loop.

atanmarko · 2024-07-21T12:18:29Z

Put it differently, think that we now have a single GenerationInputs representing 200 txns. The issue is the internal SegmentDataIterator. There is only of them, but it is responsible for generating all segments, which cannot be processed as fast as they are being generated.

We still have the outer loop iterating over txns because we want the flexibility here. This is configured by the -b batch size argument when calling the proving mode. But theoretically, the best is when the batch covers a whole block at once, effectively removing the outer loop.

Ok, I had wrong understanding of the problem. I'll look how to configurable limit data_iterator

* feat: reorganize cli params * fix: conficts * fix: review * fix: review 2

atanmarko · 2024-07-23T15:58:48Z

Returned it to draft, CI tests seem pretty slow (maybe there is some other reason), but I need to test this PR more with the real prover tomorrow.

…timize-segment-proof-aggregation

atanmarko · 2024-08-12T09:58:04Z

PR is tested. Currently we have fixed parameters for cpu len and batch size of 10, where premature segment chunk aggregation does not make a big improvement in memory used, but takes significant CPU performance hit. We give up this approach.
Alternative that would be better but would probably take much more effort (if possible) would be to change Paladin to be able to run chained operations (map/fold) without waiting for the first operation to finish from all the workers.

atanmarko requested a review from LindaGuiga July 19, 2024 14:13

atanmarko self-assigned this Jul 19, 2024

github-actions bot added the crate: zero_bin Anything related to the zero-bin subcrates. label Jul 19, 2024

atanmarko force-pushed the feat/optimize-segment-proof-aggregation branch 3 times, most recently from 98c4750 to f01714a Compare July 19, 2024 14:23

atanmarko added the performance Performance improvement related changes label Jul 19, 2024

atanmarko added this to the Performance Tuning milestone Jul 19, 2024

atanmarko added 4 commits July 19, 2024 16:37

fix: failed to send proof

d818e5e

fix: optimize segment proof aggregation

f7e37b2

fix: optimize

8009310

fix: clippy

daf0e59

atanmarko force-pushed the feat/optimize-segment-proof-aggregation branch from f497f6e to daf0e59 Compare July 19, 2024 14:41

fix: ordered

1a44121

atanmarko force-pushed the feat/optimize-segment-proof-aggregation branch from c34a6da to 1a44121 Compare July 19, 2024 15:32

fix: increase the chunk size

a8324f0

atanmarko marked this pull request as ready for review July 19, 2024 16:00

atanmarko requested review from muursh and Nashtare as code owners July 19, 2024 16:00

Nashtare reviewed Jul 20, 2024

View reviewed changes

atanmarko marked this pull request as draft July 22, 2024 08:40

atanmarko changed the title ~~feat: optimize segment proof aggregation~~ WIP: feat: optimize segment proof aggregation Jul 22, 2024

atanmarko added 2 commits July 23, 2024 16:20

feat: introduce segments chunk proving

3d47ee4

fix: cli conflict

616598b

github-actions bot added the crate: evm_arithmetization Anything related to the evm_arithmetization crate. label Jul 23, 2024

atanmarko changed the title ~~WIP: feat: optimize segment proof aggregation~~ feat: optimize segment proof aggregation Jul 23, 2024

atanmarko marked this pull request as ready for review July 23, 2024 14:53

atanmarko and others added 2 commits July 23, 2024 17:22

fix: remove cloning

4952b04

feat: reorganize cli params (#432)

e09916f

* feat: reorganize cli params * fix: conficts * fix: review * fix: review 2

atanmarko marked this pull request as draft July 23, 2024 15:56

fix: remove cloning

134cbe9

atanmarko force-pushed the feat/optimize-segment-proof-aggregation branch from 066d761 to 134cbe9 Compare July 24, 2024 10:58

feat: add prover config argumetns to scripts

8060288

atanmarko force-pushed the feat/optimize-segment-proof-aggregation branch from 542d5f3 to 8060288 Compare July 24, 2024 13:10

github-actions bot added the crate: proof_gen Anything related to the proof_gen crate. label Jul 25, 2024

feat: fix aggregation add segmentation ci test

5b92df1

atanmarko force-pushed the feat/optimize-segment-proof-aggregation branch from eaaf614 to 5b92df1 Compare July 25, 2024 17:12

fix: rename ci tests

50c5d21

atanmarko marked this pull request as ready for review July 26, 2024 01:24

fix: test

e45be0b

atanmarko force-pushed the feat/optimize-segment-proof-aggregation branch from db4367f to e45be0b Compare July 26, 2024 01:32

atanmarko requested a review from Nashtare July 26, 2024 09:18

Merge remote-tracking branch 'origin/feat/continuations' into feat/op…

8e59e1c

…timize-segment-proof-aggregation

Nashtare force-pushed the feat/optimize-segment-proof-aggregation branch from 56911c7 to 8e59e1c Compare August 10, 2024 16:24

atanmarko closed this Aug 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: optimize segment proof aggregation #410

feat: optimize segment proof aggregation #410

atanmarko commented Jul 19, 2024 •

edited

Loading

atanmarko commented Jul 19, 2024

Nashtare left a comment

atanmarko commented Jul 21, 2024 •

edited

Loading

Nashtare commented Jul 21, 2024

atanmarko commented Jul 21, 2024

atanmarko commented Jul 23, 2024

atanmarko commented Aug 12, 2024

feat: optimize segment proof aggregation #410

feat: optimize segment proof aggregation #410

Conversation

atanmarko commented Jul 19, 2024 • edited Loading

atanmarko commented Jul 19, 2024

Nashtare left a comment

Choose a reason for hiding this comment

atanmarko commented Jul 21, 2024 • edited Loading

Nashtare commented Jul 21, 2024

atanmarko commented Jul 21, 2024

atanmarko commented Jul 23, 2024

atanmarko commented Aug 12, 2024

atanmarko commented Jul 19, 2024 •

edited

Loading

atanmarko commented Jul 21, 2024 •

edited

Loading