-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arithmetizing Keccak-p permutation #1559
Comments
Awesome write-up! Thank you! One context in which we may want to use this is to verify that a bunch of Keccak hashes were done correctly. This would involve having a dedicated proof for these Keccak hashes which can then be verified in the VM. For example, such a proof may involve computing 32 keccak hashes on 32 different inputs. I think the questions here are:
|
I think that would be the general idea. It would entail that we would have Keccak and RPX both and in parallel hashing the same stream of data and hence be able to use the RPX digest as public input. This means that in the AIR for Keccak we will also have to add RPX alongside it, from a cost point of view this is negligible relative to the pure cost of arithmetizing Keccak.
I did some rough cycle counting based on our current (partial) implementation of the recursive verifier for Miden VM. Since we will have a smaller number of constraints than in the VM, I am assuming that the cost to generate the randomness in the case of Keccak will be the same as that of our recursive verifier (but even if this is not true, this will still be a fraction of the cost of leaf-unhashing). Thus the main component that will be different between our recursive verifier for the VM and a recursive verifier for Keccak is leaf unhashing and computing the random linear combinations of the DEEP quotients. I am counting roughly Note that constraint evaluation in the case of the Keccak AIR will be significantly lower than constraint evaluation for the Miden VM case, and hence the final cost for the whole recursive verifier for the Keccak AIR should be well bellow |
How does this roughly break down between unhashing and liner combinations? Seems like linear combinations would dominate here, right? |
Exactly, the cost for unhashing is around |
One more radical (and longer-term) approach to bring the cycle-count down, is it extend the hasher chiplet to enable hashing and computing random linear combination at the same time. This would require the hasher chiplet to read memory while it performs hashing and is probably tricky to do in general, but if we can come up with a good way to do this, then from the stack standpoint, leaf unhashing and computing random linear combinations may be done with just a few instructions regardless of the leaf size. |
I agree, I think given some prior discussions related to this topic, the right "primitive" would be unhashing while at the same time computing an inner product against a vector in memory. This would have potentially other uses besides computing the DEEP queries, e.g., evaluating large polynomials at a random point. |
In regards to recursive proof verification, I think there are 2 general things we should investigate for how to offload computation to chiplets:
It maybe that we can replace our current specialized instructions for linear combinations and FRI folding with something even higher-level that may allow us to reduce recursive proof verification cycle count even beyond what we have now. |
Makes sense! I have a feeling that we can get a long way with a couple of instructions that compute inner products, both in base field and extension field, to optimize both points in your comment. |
We want to arithmetize (using an AIR) the Keccak-p permutation. Keccak-p is parametrized using two parameters$b$ and $n_r$ where $n_r$ is the number of rounds and $b$ is the size in bits of the input/output size of the permutation. For us, the choice $(b, n_r) = (1600, 24)$ is the most relevant as this is the choice used in SHA3. The Keccak-p permutation operates on a state $\mathsf{s}$ comprised of $1600$ bits and interpreted as a cube of of size $5 \times 5 \times 64$ and indexed with variables $(x, y, z)$ where the inner-most $64$ -bit vector has index $(x, y) = (0, 0)$ . If we denote by $\mathsf{a}(x, y, z)$ the value of the $(x, y, z)$ -bit of the cube then we can use the following correspondence $$\mathsf{s}[w\cdot(5y + x) + z] = \mathsf{a}[x, y, z]$$ $(x,y)$ , we call the vector of $64$ -bits indexed by $z$ a lane and we denote it by $\mathsf{a}[x, y]$ . We overload bit operations to also denote their vectorized versions operating on lanes.
For a fixed
We can now describe the Keccak-p permutation as the repeated application, for$n_r = 24$ rounds, of the following composition of functions $$\iota_r \circ \chi \circ \pi \circ \rho \circ \theta$$ $\iota_r$ depends on the round index.
where only
Description
The$\theta$ step function
where index arithmetics is done modulo the size of the corresponding index-space and addition of bits is done in the finite binary field of size$2$ i.e., xor.
Note that the above can be written more succinctly using vectorized notation as follows:
Basically, we have two operations on$64$ -bit vectors, i.e. lanes, $\mathsf{XOR}$ and $\mathsf{ROT}$ .
The$\rho$ step function:
The$\pi$ step function:
The$\chi$ step function:
The$\iota_r$ step function:
where the round constant$\mathsf{RC}[r]$ for round $r$ is a 64-bit vector.
Arithmetization strategies
Using Look-Up Tables
Since all of the arithmetic operations happen at the level of lanes, and since our field is a 64-bit prime field, we have to either work with the individual bits of each lane or work with limbs of size less than$64$ and use lookup tables (LUT). From a practical point of view, limbs of size $8$ are probably the best choice. We can probably use limbs of size $10$ but then things become more complicated and will require additional LUTs which might end-up hurting performance.$8$ and hence we can represent each lane using $8$ limbs and equivalently $8$ field elements.
Let's discuss the approach using LUTs. Suppose we choose limb size
In this setting:
A similar thing is done for$\mathsf{H}$ and the result of the rotation is computed by patching the split bits of $\mathsf{L}$ and $\mathsf{H}$ .
In total, we will need 1 LUT for$\mathsf{AND}$ , 1 LUT for $\mathsf{XOR}$ , 7 LUT for all possible byte splittings as used in $\mathsf{ROT}$ , and 1 LUT to range check bytes.
We now give an account of the number of cells needed in order to arithmetize one round of the Keccak-p permutation:
The total number of columns is thus$1508$ of which $900$ will be subject to lookups. This means, assuming that the max degree of the AIR is $3$ , that we will need $450$ helper columns for LogUp in the auxiliary trace. Hence, assuming a degree $2$ extension, we will need a total of $2408$ columns in the base field without accounting for the multiplicity columns for LogUp.
Working with individual bits
There is another approach described by Angus Gruen which doesn't make use of any LUT and instead works with individual bits at some of the intermediate steps of the permutation function.$5 \times 5\times 64$ and denoted by $A[x, y, z]$ , or as a cube of $u32$ -s of dimension $5 \times 5\times 2$ and denoted by $\textbf{A}[x, y, j]$ . Using this dual view, we can now first describe one round of the permutation and then the constraints to verify it.
More precisely, the state is either considered as a cube of bits of dimensions
The input state is stored as$\textbf{A}[x, y, j]$ for $j = 0, 1$
Checking$\theta$ step function:
Denote by$A^{'}[x, y, z]$ the output of the $\theta$ step function as an element in $5 \times 5\times 64$ . Then, using Lemma 2.1 in the aforementioned work, verifying this step is equivalent to the following:
Checking the$\rho$ step is trivial, and can be absorbed in the the check for the $\chi$ step, as we now have the $A^{'}[x, y, z]$ bit view representation.
Checking the$\pi$ step is also trivial and can be absorbed in the check for the $\chi$ step.
Checking the$\chi$ step function:
Checking the$\iota_r$ step:
We can now give an account of the number of cells needed to verify one round of the Keccak-p permutation function:
The total is 2406 cells (all base field) to verify the correctness of one of the 24 rounds comprising a call to the Keccak-p permutation.
Some thoughts
It seems that the second approach is better performance wise if one is thinking of using only Keccak-p and its derivatives. If one the other hand, other (classic) hash functions are to be supported, e.g., Blake, then the LUT-based approach might make sense as the cost associated with LUT-s could be amortized across arithmetizations.
The text was updated successfully, but these errors were encountered: