Circuit implementation for compression #42

Brechtpd · 2022-11-21T13:46:13Z

This is to reduce the amount of data we have to push on-chain for blocks. Sticking to the same compression scheme as optimism would be nice, and DEFLATE should be straightforward to implement as a circuit.

First step to implement this would be to just do huffman decoding using a static huffman tree, decoding an arbitrary length data stream. The huffman tree is stored in an advice column so other rows can do a lookup to decode the data using the tree.

xiaodino · 2022-11-27T04:26:57Z

Is DEFLATE compression circuit a new circuit like evm_circuit, tx_circuit and keccak_circuit?

xiaodino · 2022-12-06T02:40:04Z

Huffman Encoding https://towardsdatascience.com/huffman-encoding-python-implementation-8448c3654328
Huffman Decoding https://towardsdatascience.com/huffman-decoding-cca770065bab

johntaiko · 2022-12-14T15:07:37Z

Can we prove that the encoded txlist has the same result(data,huffman table) with public input?
Because deflate decoding has dynamic "byte" length decided by huffman table, we need check the table bit by bit.
So, maybe we should assign the data(both origin and encoded) one bit per row?

Brechtpd · 2022-12-14T15:37:42Z

Can we prove that the encoded txlist has the same result(data,huffman table) with public input?

Not sure I understand what you mean with this? You mean decoding the data in the smart contract?

Because deflate decoding has dynamic "byte" length decided by huffman table, we need check the table bit by bit.

A well known trick to decode huffman codes is to just use a fixed input table. For example, let's say there are code words 01 -> value_a and 0001 -> value_b. To decode these words you would create a lookup table like this:

code	length	value
0100	2	value_a
0101	2	value_a
0110	2	value_a
0111	2	value_a
0001	4	value_b

Then while decoding the data in the circuit, you always use 4 bit inputs to find the decoded value. However, when value_a is decoded you only jump ahead 2 bits in the input data stream (vs 4 bits when decoding value_b). So that's where the length field in the table also comes in so for each decoded value you also know how many bits to go forward.

So, maybe we should assign the data(both origin and encoded) one bit per row?

I think just decoding 1 codeword/row makes sense, no matter how many bits are decoded.

(Each row will accumulate the encoded bits and decoded bits in two accumulators so we can check that the input data stream processed by the circuit is indeed the correct one, and the output data stream can be exported to other circuits. But you don't have to worry about this that for now.)

johntaiko · 2022-12-14T15:55:40Z

If we use fixed code word, the compress ratio maybe has poor quality

Brechtpd · 2022-12-14T16:05:54Z

We should be able to load any huffman table (even one provided in the input per data stream) in a lookup table like this, but of course that's a bit more complex than just using a fixed huffman table for now. Let's go step by step without making things too complicated for now so we have the core mechanisms implemented, we may also want to add dictionary support... :)

johntaiko · 2022-12-14T16:09:43Z

We should be able to load any huffman table (even one provided in the input per data stream) in a lookup table like this, but of course that's a bit more complex than just using a fixed huffman table for now. Let's go step by step without making things too complicated for now so we have the core mechanisms implemented, we may also want to add dictionary support... :)

I got it

johntaiko · 2023-01-10T14:03:52Z

Hi @Brechtpd, I discussed with @smtmfft, and got a new approach to use compress circuit.
The previous approach:

We get the compressed_raw_public_inputs from witness(comes from taiko-node)
The compress-circuit decompresses the compressed_raw_public_inputs to raw_public_inputs as circuit output, constraits rlc(compressed_raw_public_inputs) == rlc(public_inputs:layer1) at the same time.
The pi-circuit use the decompressed data raw_public_inputs as witness

The new approach:

We get the raw_public_inputs from witness
The compress-circuit compresses the raw_public_inputs to compressed_raw_public_inputs, and constraits rlc(compressed_raw_public_inputs) == rlc(public_inputs:layer1)
The pi-circuit no need to change anything

So, the main difference between the two approaches is that the new approach will compress the pi and reduce one compression in taiko-node

cc @xiaodino

Brechtpd · 2023-01-10T14:36:53Z

Not completely sure I understand what the difference is. You mean the witness data sent to the circuit from the node software is the uncompressed data instead of the compressed data? If so, yeah that makes sense. Thought the node will still have to be able to support compression (for block submission) and decompression (for reading the transaction from the chain). But for the data sent to the prover compressed or uncompressed seems to be equivalent so whichever is easier to support (which I guess is the uncompressed data because it doesn't need any changes in the node witness generation).

As for the circuit it doesn't seem to matter if you see it as compressing or decompressing the data because that's the exact same circuit for both, but you probably just stress the compress/decompress difference because of the how you look at it?

In the diagram, there's an arrow from zk-evm -> layer1 with (witness:raw_public_inputs). raw_public_inputs is not the compressed but we're sending that to layer 1? Should be the compressed_public_inputs right?

Also the pi-circuit doesn't need any changes no matter how you look at it no? There's just some extra constraints in the circuit that somehow makes sure that rlc(raw_public_inputs) == rlc(public_inputs:layer1), so some extra (de)compression constraints are added but that's it? Depending on what the circuit gets sent from the node it just compresses or decompresses that data so both can be put in the circuit at witness generation time, but the actual circuit constraints remain unchanged.

johntaiko · 2023-01-11T02:35:25Z

Not completely sure I understand what the difference is. You mean the witness data sent to the circuit from the node software is the uncompressed data instead of the compressed data?

Yes.

(which I guess is the uncompressed data because it doesn't need any changes in the node witness generation).

Yeah, I think this is the main benefit when using uncompressed data.

As for the circuit it doesn't seem to matter if you see it as compressing or decompressing the data because that's the exact same circuit for both, but you probably just stress the compress/decompress difference because of the how you look at it?

Totally agrees.

In the diagram, there's an arrow from zk-evm -> layer1 with (witness:raw_public_inputs). raw_public_inputs is not the compressed but we're sending that to layer 1? Should be the compressed_public_inputs right?

Oh sorry, the diagram has some mistakes. zkevm gets withness from layer2 node, layer2 node proposes the compressed_public_inputs to layer1 for verify

Also the pi-circuit doesn't need any changes no matter how you look at it no?

I think so.

There's just some extra constraints in the circuit that somehow makes sure that rlc(raw_public_inputs) == rlc(public_inputs:layer1)

Do you mean we should constraint the rlc(compressed_public_inputs) == rlc(public_inputs:layer1)？
Because the pi data from layer1 is compressed

Brechtpd · 2023-01-11T02:56:59Z

Do you mean we should constraint the rlc(compressed_public_inputs) == rlc(public_inputs:layer1)？
Because the pi data from layer1 is compressed

Woops sorry my mistake, meant to say something like rlc(raw_public_inputs) == rlc(decompress(public_inputs:layer1)). Just that the decompressed bitstream matches the current raw_public_inputs in the public input, nothing else interesting.

johntaiko · 2023-01-11T03:27:56Z

Woops sorry my mistake, meant to say something like rlc(raw_public_inputs) == rlc(decompress(public_inputs:layer1)). Just that the decompressed bitstream matches the current raw_public_inputs in the public input, nothing else interesting.

I don't know if I'm missing some context, I think it's not a good approach with constrainting rlc(raw_public_inputs) == rlc(decompress(public_inputs:layer1)) because the pi from layer1 is rlc(compressed_public_inputs:layer1)

So if we constraint the equivalent between two rlc-decompressed-data, than the rlc-compressed-data from layer1 is redundant

Brechtpd · 2023-01-11T14:22:28Z

My way of writing that I guess was still unfortunate :). Just trying to say that the circuit constraints that the data in the raw_public_inputs column matches the decompressed data from the compressed_public_inputs column. But no matter how that is done, we still need to make sure that the input data (which will be the compressed data) matches the data expected by the smart contract used as the public input.

johntaiko · 2023-01-12T09:34:32Z

I have no permission to associate the pr #55, so I just use the reply. ( @Brechtpd can add me into the zkevm group

There are some TODOs still unresolved marked in pr, and I think they will be done soon.

But I have no idea about how to split compressed pi into 32bytes stream for rlc encode in circuit, I need help. @Brechtpd

Brechtpd added the area.zk-evm label Nov 21, 2022

Brechtpd assigned kongliangzhong, johntaiko and xiaodino Nov 21, 2022

johntaiko mentioned this issue Dec 15, 2022

Deflate Circuit Spec #54

Open

johntaiko linked a pull request Jan 12, 2023 that will close this issue

feat: support compress public inputs #55

Draft

3 tasks

Brechtpd linked a pull request Jan 12, 2023 that will close this issue

feat: support compress public inputs #55

Draft

3 tasks

dionysuzx added the status.needs-triage label May 1, 2023

Brechtpd added meta.alpha-n and removed status.needs-triage labels May 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Circuit implementation for compression #42

Circuit implementation for compression #42

Brechtpd commented Nov 21, 2022 •

edited

Loading

xiaodino commented Nov 27, 2022

xiaodino commented Dec 6, 2022

johntaiko commented Dec 14, 2022

Brechtpd commented Dec 14, 2022 •

edited

Loading

johntaiko commented Dec 14, 2022

Brechtpd commented Dec 14, 2022

johntaiko commented Dec 14, 2022 •

edited

Loading

johntaiko commented Jan 10, 2023 •

edited

Loading

Brechtpd commented Jan 10, 2023

johntaiko commented Jan 11, 2023

Brechtpd commented Jan 11, 2023

johntaiko commented Jan 11, 2023

Brechtpd commented Jan 11, 2023

johntaiko commented Jan 12, 2023 •

edited

Loading

Circuit implementation for compression #42

Circuit implementation for compression #42

Comments

Brechtpd commented Nov 21, 2022 • edited Loading

xiaodino commented Nov 27, 2022

xiaodino commented Dec 6, 2022

johntaiko commented Dec 14, 2022

Brechtpd commented Dec 14, 2022 • edited Loading

johntaiko commented Dec 14, 2022

Brechtpd commented Dec 14, 2022

johntaiko commented Dec 14, 2022 • edited Loading

johntaiko commented Jan 10, 2023 • edited Loading

Brechtpd commented Jan 10, 2023

johntaiko commented Jan 11, 2023

Brechtpd commented Jan 11, 2023

johntaiko commented Jan 11, 2023

Brechtpd commented Jan 11, 2023

johntaiko commented Jan 12, 2023 • edited Loading

Brechtpd commented Nov 21, 2022 •

edited

Loading

Brechtpd commented Dec 14, 2022 •

edited

Loading

johntaiko commented Dec 14, 2022 •

edited

Loading

johntaiko commented Jan 10, 2023 •

edited

Loading

johntaiko commented Jan 12, 2023 •

edited

Loading