Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do multiplicity witgen in separate stage #2319

Merged
merged 6 commits into from
Jan 9, 2025
Merged

Conversation

georgwiese
Copy link
Collaborator

@georgwiese georgwiese commented Jan 8, 2025

While working on #2306, @Schaeff came across several bugs in multiplicity witness generation. These were undetected, because we ignored multiplicities in the mock prover, which will be fixed by #2310. With this PR, #2310 will be green.

The issue was that counting multiplicities inside Machine::process_plookup() fails if the caller actually discards the result. This happens in a few places, for example during our loop optimization in the "dynamic machine".

With this PR, we instead have a centralized MultiplicityColumnGenerator that counts multiplicities after the fact, by going over each lookup, evaluating the two selected tuples on all rows, and counting how often each element in the LHS appears in the RHS.

To measure the runtime of this, I ran:

export TEST=keccak
export POWDR_JIT_OPT_LEVEL=0

cargo run -r --bin powdr-rs compile riscv/tests/riscv_data/$TEST -o output --max-degree-log 18
cargo run -r --features plonky3,halo2 pil output/$TEST.asm -o output -f --field gl --linker-mode bus

I get the following profile on the server:

 == Witgen profile (2554126 events)
   32.4% (    2.6s): Secondary machine 0: main_binary (BlockMachine)
   23.1% (    1.9s): Main machine (Dynamic)
   12.7% (    1.0s): Secondary machine 4: main_regs (DoubleSortedWitnesses32)
   10.0% ( 809.9ms): FixedLookup
    7.7% ( 621.1ms): Secondary machine 5: main_shift (BlockMachine)
    5.6% ( 454.6ms): Secondary machine 2: main_poseidon_gl (BlockMachine)
    3.8% ( 312.3ms): multiplicity witgen
    3.8% ( 308.2ms): witgen (outer code)
    0.6% (  45.3ms): Secondary machine 1: main_memory (DoubleSortedWitnesses32)
    0.4% (  33.4ms): Secondary machine 6: main_split_gl (BlockMachine)
    0.0% (   8.0µs): Secondary machine 3: main_publics (WriteOnceMemory)
  ---------------------------
    ==> Total: 8.114630092s

So the cost is ~4%. I'm sure it can be optimized further but I would like to leave this to a future PR.

@georgwiese georgwiese changed the base branch from main to check-multiplicities-in-mock-prover January 8, 2025 19:54
@@ -418,7 +546,7 @@ impl<'a, T: FieldElement> FixedData<'a, T> {
.definitions
.iter()
.filter(|(_, (symbol, _))| matches!(symbol.kind, SymbolKind::Poly(_)))
.map(|(name, (symbol, _))| (name.clone(), symbol.into()))
.flat_map(|(_, (symbol, _))| symbol.array_elements())
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a bug IMO, didn't expand array elements.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe Symbol should not implement this Into, or at least change to TryInto

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, but this can be a different PR. I just checked and it is only used several times in Analyzed::remove_definitions.

@georgwiese georgwiese force-pushed the fix-multiplicity-witgen2 branch from a3f43b3 to 4b0e9d1 Compare January 9, 2025 02:48
@georgwiese georgwiese changed the base branch from check-multiplicities-in-mock-prover to main January 9, 2025 02:49
@georgwiese georgwiese changed the title Fix multiplicity witgen Do multiplicity witgen in separate stage Jan 9, 2025
self.fixed
.fixed_cols
.iter()
// TODO: Avoid clone
Copy link
Collaborator Author

@georgwiese georgwiese Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should not be too hard, OwnedTerminalValues could do everything it does with a reference, but that's a different PR.

@georgwiese georgwiese marked this pull request as ready for review January 9, 2025 03:22
}

pub fn row(&self, row: usize) -> RowValues<F> {
RowValues { values: self, row }
}

pub fn destroy(self) -> BTreeMap<PolyID, Vec<F>> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

into_trace? into_inner?

@@ -418,7 +383,8 @@ impl<'a, T: FieldElement> FixedData<'a, T> {
.definitions
.iter()
.filter(|(_, (symbol, _))| matches!(symbol.kind, SymbolKind::Poly(_)))
.map(|(name, (symbol, _))| (name.clone(), symbol.into()))
.flat_map(|(_, (symbol, _))| symbol.array_elements())
// .map(|(name, (symbol, _))| (name.clone(), symbol.into()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover commented out code

let indices = lhs_tuples
.into_par_iter()
.map(|(_, tuple)| index[&tuple])
.collect::<Vec<_>>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to collect here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, because this is a parallel iterator, which is not a proper iterator.

Copy link
Collaborator

@Schaeff Schaeff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small comments but looks really good and gets rid of a lot of sub optimal code. I think you mentioned multiplicities being computed twice because they are sometimes needed early, why did that change?

@georgwiese georgwiese requested a review from Schaeff January 9, 2025 14:11
@georgwiese
Copy link
Collaborator Author

Thanks for the quick review! I applied your suggestions.

I think you mentioned multiplicities being computed twice because they are sometimes needed early, why did that change?

So I think rn, this doesn't happen, because the code so far only applies to phantom lookups, and we never use the multiplicity for anything else besides the lookup.

I think that will change once we use the bus natively. It would work something like this:

  • For each bus receive:
    • Build the index (same as the lookup RHS now)
    • Find all matching sends (there might be several). For each:
      • Collect all tuples (same as lookup LHS now)
      • Update multiplicities

But we use bus interactions also to express permutation arguments, for example, but in that case the multiplicity is what's now the RHS selector, which we do use otherwise sometimes (e.g. to find out if a block is an actual block or just fulling up rows). So I think that it could be that we end up computing the multiplicity twice.

@Schaeff Schaeff added this pull request to the merge queue Jan 9, 2025
Merged via the queue into main with commit e328eb9 Jan 9, 2025
16 checks passed
@Schaeff Schaeff deleted the fix-multiplicity-witgen2 branch January 9, 2025 15:27
leonardoalt pushed a commit that referenced this pull request Jan 11, 2025
While working on #2306, @Schaeff came across several bugs in
multiplicity witness generation. These were undetected, because we
ignored multiplicities in the mock prover, which will be fixed by #2310.
With this PR, #2310 will be green.

The issue was that counting multiplicities inside
`Machine::process_plookup()` fails if the caller actually discards the
result. This happens in a few places, for example during our loop
optimization in the "dynamic machine".

With this PR, we instead have a centralized
`MultiplicityColumnGenerator` that counts multiplicities after the fact,
by going over each lookup, evaluating the two selected tuples on all
rows, and counting how often each element in the LHS appears in the RHS.

To measure the runtime of this, I ran:
```sh
export TEST=keccak
export POWDR_JIT_OPT_LEVEL=0

cargo run -r --bin powdr-rs compile riscv/tests/riscv_data/$TEST -o output --max-degree-log 18
cargo run -r --features plonky3,halo2 pil output/$TEST.asm -o output -f --field gl --linker-mode bus
```

I get the following profile on the server:
```
 == Witgen profile (2554126 events)
   32.4% (    2.6s): Secondary machine 0: main_binary (BlockMachine)
   23.1% (    1.9s): Main machine (Dynamic)
   12.7% (    1.0s): Secondary machine 4: main_regs (DoubleSortedWitnesses32)
   10.0% ( 809.9ms): FixedLookup
    7.7% ( 621.1ms): Secondary machine 5: main_shift (BlockMachine)
    5.6% ( 454.6ms): Secondary machine 2: main_poseidon_gl (BlockMachine)
    3.8% ( 312.3ms): multiplicity witgen
    3.8% ( 308.2ms): witgen (outer code)
    0.6% (  45.3ms): Secondary machine 1: main_memory (DoubleSortedWitnesses32)
    0.4% (  33.4ms): Secondary machine 6: main_split_gl (BlockMachine)
    0.0% (   8.0µs): Secondary machine 3: main_publics (WriteOnceMemory)
  ---------------------------
    ==> Total: 8.114630092s
```

So the cost is ~4%. I'm sure it can be optimized further but I would
like to leave this to a future PR.
github-merge-queue bot pushed a commit that referenced this pull request Jan 20, 2025
This is not necessary anymore since #2319: We previously had a
multiplicity witgen pass only for range constraints via lookups. Now we
have it for all phantom lookups, which includes range constraints via
lookups.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants