-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: triton-cli prove/verify, proof binary serialization #305
base: master
Are you sure you want to change the base?
Conversation
Thanks for the contribution! And sorry for the slow follow-up. You seem to have caught @jan-ferdinand in the middle of one of his rare holidays, and he's the boss when it comes this repo. Have you seen triton-tui? It's the debugger we use for Triton VM programs. We are not looking per se for a single tool that does everything (although that is an option), but it would be nice if the two tools had the same invocation syntax and interchangeable file formats, at least as far as programs, inputs, outputs, and nondeterminism go. If memory serves, Have you considered adding a |
Hey @aszepieniec thanks for the review. I based the cli on the feedback in #299. I wanted to see if there were any ideas on how to implement inputs (private and public). Naively i'd probably implement it as two optional flags on the prove command, each accepting a comma separated list of values. The next step from that would be passing files in some format. |
This PR isn't time sensitive though, i've got a local copy of the cli i'm using for debugging so there's no rush here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really shaping up! Thanks a lot for your effort with this.
I have left various questions and comments in-line. Additionally, I have the following general comments:
- It seems like a good idea to support all 3 types of private input (collectively known as
NonDeterminism
):- individual tokens, which can be used by instruction
divine
– this is already supported - initial RAM, which can be used by any instruction accessing memory
- non-deterministic digests, which can be accessed by instruction
divine_sibling
1
- individual tokens, which can be used by instruction
- The user experience might become better if the error-propagation crate
anyhow
was used. By changing signature ofmain()
to return ananyhow::Result<()>
, errors can be propagated all the way to the program's entrypoint. This removes the need for many a.unwrap()
or.expect()
.
Footnotes
-
divine_sibling
is soon to be replaced bymerkle_step
, but that will not change the behavior ofNonDeterminism
↩
} | ||
} | ||
|
||
fn digest_to_str(d: Digest) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the next version of triton-vm
, the method digest.to_hex()
will be available. This will guarantee that hex-ified digests are consistent everywhere. 😊
triton-cli/src/main.rs
Outdated
// write the proof in an arbitrary binary format | ||
// | ||
// version: u8 | ||
// | ||
// security_level: u64 | ||
// fri_expansion_factor: u64 | ||
// num_trace_randomizers: u64 | ||
// num_colinearity_checks: u64 | ||
// num_combination_codeword_checks: u64 | ||
// | ||
// digest_length: u8 | ||
// public_input_length: u64 | ||
// public_output_length: u64 | ||
// proof_length: u64 | ||
// | ||
// program_digest: u64[digest_length] | ||
// | ||
// public_inputs: u64[public_input_length] | ||
// public_outputs: u64[public_output_length] | ||
// proof: u64[proof_length] | ||
write_proof(data, out).expect("failed to write proof to file"); | ||
println!("proof written to: {out}"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of (de)serializing ad-hoc, how about using libraries built for exactly that purpose instead? In the example below, I'm using serde
and bincode
, but other alternatives might exist.
#[derive(Debug, Clone, Eq, PartialEq, Serialize, Deserialize)]
struct SerializableProof {
version: u8,
stark: Stark,
claim: Claim,
proof: Proof,
}
fn write_proof(proof_path: &str, (stark, claim, proof): (Stark, Claim, Proof)) -> Result<()> {
let serializable_proof = SerializableProof {
version: 1,
stark,
claim,
proof,
};
let encoded = bincode::serialize(&serializable_proof)?;
fs::write(proof_path, &encoded)?;
Ok(())
}
fn read_proof(proof_path: &str) -> Result<(Stark, Claim, Proof)> {
let encoded = fs::read(proof_path)?;
let SerializableProof {
version,
stark,
claim,
proof,
} = bincode::deserialize(&encoded)?;
if version != 1 {
bail!("unsupported proof version: {version}"); // uses crate `anyhow`
}
Ok((stark, claim, proof))
}
This is probably not the final version of the code yet: in the suggestion, the proof's version is embedded in the struct SerializableProof
. However, in order to actually support different proof versions, it should probably be parsed first, and subsequent parsing should then depend on the version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably not the final version of the code yet: in the suggestion, the proof's version is embedded in the struct SerializableProof. However, in order to actually support different proof versions, it should probably be parsed first, and subsequent parsing should then depend on the version.
Yes, i really just wanted to reserve the first byte so in the future it's possible to implement versioning.
I didn't use a library because i'm not familiar with them. In the example above, will changes to the Stark
, Claim
, or Proof
struct (or any nested structs) change the file structure? I'm not opposed to using libraries but I think it's important that the file structure is well documented/stable so others can write parsers in other languages. Explicitly writing the file byte by byte is an easy way to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess my question is, how do we make sure version
is changed anytime the Stark
/Claim
/Proof
struct is modified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up making a custom struct and using serde+bincode with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess my question is, how do we make sure
version
is changed anytime theStark
/Claim
/Proof
struct is modified?
Yeah, that's a super valid question. My proposed “solution” falls short on at least two counts:
- since the
version
is a field on the struct itself, making de-serialization conditional to the version doesn't work - it's not obvious (to me) how to ensure correct versioning if any of
Stark
,Claim
, orProof
's format changes
A custom (de)serializer certainly allows addressing both of those shortcomings. That said, it feels as if this is a problem other people have encountered before, and have provided a general solution for. Maybe this solution is serde
∧/∨ bincode
, but I'm not sure. I'll take another look. 😌
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, i really just wanted to reserve the first byte so in the future it's possible to implement versioning.
That is a super good idea as far as I can see. 👍
Co-authored-by: Jan Ferdinand Sauer <[email protected]>
Co-authored-by: Jan Ferdinand Sauer <[email protected]>
Co-authored-by: Jan Ferdinand Sauer <[email protected]>
@jan-ferdinand great review, thank you
I was just about to open an issue asking about initial RAM. I'm not sure how to specify this for a program, do you have any links that could help me?
Also something I'm not familiar with. I've read the I'm kinda thinking that these additional methods of specifying inputs should be added in separate pr's.
Looking into this rn |
The |
Should you ever be looking for something similar but for a library, check out |
Happy to! 😊
Yeah, the docs on this are not super great, I have to admit. 🙇 There are various answers to your question.
Yeah, The node's indices are green, the parent/child relation is marked by the black lines. The tree's root has index 1.2 Let's assume you want to walk the authentication path for node 10. In the picture, the authentication path in question is marked by red circles. To compute node 10's parent, which is node 5, you also need to know node 11. Hashing left sibling (node 10) and right sibling (node 11) gives the parent (node 5). In order to get node 5's parent, you also need node 4. Again, hashing left sibling (node 4) and right sibling (node 5), gives the parent (node 2). Now, instruction Of course, this produces the correct result only if the private input for sibling digests, also known as “non-deterministic digests”, is correctly populated. Feel free to ask additional questions. 😌
That's fine with me. 🙂👍 Footnotes
|
Great explanation of
The explanation makes sense, though i'll need to play with it more to understand it fully. My only question is, what motivated you to implement this at the VM level instead of the program level? Or rather, how much overhead (if any) does it add to programs that don't use the instruction? Or is it necessary for making the stark proof itself? |
It's an optimization for recursive verification and not required to generate a STARK proof. An alternative to having the instruction would be something along the lines of: ; never run, never tested – likely to contain bugs
divine 5 ; get the sibling digest
dup 10 ; duplicate the node index
push 2
div_mod
pop 1 ; least-significant bit of node index is now on top of the stack
skiz
call swap_digests ; requires something like 15 instruction? I didn't count When verifying a STARK, there are many Merkle tree verifications happening, each of which requires several executions of You are right in assuming that there is an overhead to having this instruction around. And unfortunately, this overhead is something that everyone using Triton VM is paying, independend of them using the instruction. It is possible to design ZK-VMs that mitigate this to a certain degree, but Triton VM has not adopted any of those mitigation strategies yet; our current focus is elsewhere. 😌 Footnotes
|
@chancehudson Good job! |
triton-cli/src/main.rs
Outdated
#[derive(Serialize, Deserialize, Debug, Clone)] | ||
struct SerializedProof { | ||
version: u8, | ||
security_level: u64, | ||
fri_expansion_factor: u64, | ||
num_trace_randomizers: u64, | ||
num_colinearity_checks: u64, | ||
num_combination_codeword_checks: u64, | ||
program_digest: Vec<u64>, | ||
public_inputs: Vec<u64>, | ||
public_outputs: Vec<u64>, | ||
proof: Vec<u64>, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting this right seems to be more difficult (for me) than I had anticipated. I see more or less three main ways we can go about it:
- Declare proof versioning out of scope for Triton CLI.
- Detect out-of-sync versions and report them as an error.
- Support as many proof versions as possible.
1 – Declare Versioning to be Out of Scope
The simplest “solution” from an implementation standpoint. If deserialization fails, the cause might be an out-of-date proof, hard disk failure, aliens, …. The user can check for an update of Triton CLI and try their luck again.
2 – Error on Out-of-Sync Proof
Requires
- storing the proof's version alongside the proof, and
- knowing the version of Triton VM.
3 - Support Different Proof Versions
Requires everything from (2) plus old versions of Triton VM as additional dependencies. Then, proof generation / verification can be delegated to the chosen version. This is certainly possible by renaming dependencies. The biggest downside I see are compilation times and binary size – Triton VM is not a particularly leightweight library.
Detecting Versions
Of the Proof
For strategies 2 and 3, the proof's version must be accessible whether deserialization works or not. Here are suggestions that might work:
- Use the file's first few bytes for the proof's version, store the serialized proof in
bytes[some_index..]
. - Declare a big
enum
with one struct variant per proof version. For example:
#[derive(Debug, Clone, Serialize, Deserialize)]
enum VersionedSerializableProof {
V0_13 { stark: triton_vm_0_13::Stark, claim: triton_vm_0_13::Claim, .. },
V0_22 { stark: triton_vm_0_22::Stark, claim: triton_vm_0_22::Claim, .. },
..
}
I'm not sure this works, though: I don't think serde
is able to distinguish between two different versions of same type.
Of Triton CLI / VM
With strategy (2) “Error on out-of-sync proof”, the version of Triton VM in use must be known. The ways I see to achieve this, are:
- Use compile-time environment variable
CARGO_PKG_VERSION
via theenv!
macro to get a&'static str
to the version specifier in theCargo.toml
of Triton CLI. Only works under the assumption that Triton CLI has the same version as Triton VM. Notably, this is not currently the case. - Use a
build.rs
build script and some parsing of theCargo.toml
to populate a compile-time environment variable with the version of Triton VM in use, which can then be read via theenv!
macro. - Manually update some hardcoded
const
whenever the version of Triton VM is bumped.
I must admit, none of these approaches seem particularly resilient.
What do you think? Do these thoughts and conclusions make sense? Am I overlooking something blatantly obvious?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In versioning we have 2 things that can change. The first is the structure of the binary file itself. This affects how we implement encoding/decoding, but does not necessarily affect how the proof is verified. The second is the version of triton-vm that generated the proof. This is much more difficult to handle because a proof might be (de)serialized just fine and then fail verification because e.g. the merkle structure was off by 1 or the hash function changed etc.
The first case is firmly in scope for this package. Anytime we change the SerializedProof
structure we should keep the old structure/implementation somewhere and match
into the old structure based on the version indicator until we explicitly drop support for a version. The second case seems mostly out of scope for this package. By that i mean, we should not try and support anything but the current version. The triton-cli version number should follow triton-vm. e.g. anytime a breaking version of triton-vm is released, a breaking triton-cli version should be released as well. I would lean into the versioning system here and use the CARGO_PKG_VERSION
to throw an error if a proof is verified from the wrong major version.
The tricky part here is detecting when a breaking change is made to the STARK prover/verifier logic and strictly following versioning rules. I would commit a proof to this repo from the current version of triton-vm and have the CI attempt to verify this proof on every push. This will fail anytime a breaking change is made which is an indicator to push a breaking version release (or try to undo the breaking change). Sort of a makeshift e2e test.
I think to mitigate some of the risk here we can add more variables into the serialized proof. Namely:
- name/identifier of hash function used
- field/field extension used
This would add slightly more flexibility on detecting implementation details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would handle this versioning stuff lazily though. Let's call these things the "proof file version" and the "prover version".
We start with proof file version 1, there are no other versions to support right now. We start with prover version 0.x.x
. We specify that if the prover version is not defined in the proof file it is assumed to be 0.x.x
.
To pedantically support this I think we need to add the CARGO_PKG_VERSION
check logic to the prover/verification and throw an error on prover version >0.x.x
. It would be good to add a comment near this logic documenting this conversation/plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I applied this in 553a028.
I ended up putting the version in the proof file as it seemed more self-documenting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW i don't know how rust versioning works. I'm using the semver mental model.
triton-cli/src/main.rs
Outdated
assert_eq!( | ||
DIGEST_LENGTH, | ||
serialized_proof.program_digest.len(), | ||
"digest length mismatch!" | ||
); | ||
let digest_elements = serialized_proof | ||
.program_digest | ||
.iter() | ||
.map(|v| BFieldElement::from(*v)) | ||
.collect::<Vec<BFieldElement>>(); | ||
let digest = Digest::try_from(&digest_elements[0..DIGEST_LENGTH])?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't have to do the length check yourself if you pass the entire digest_elements
into the try_from
:
assert_eq!( | |
DIGEST_LENGTH, | |
serialized_proof.program_digest.len(), | |
"digest length mismatch!" | |
); | |
let digest_elements = serialized_proof | |
.program_digest | |
.iter() | |
.map(|v| BFieldElement::from(*v)) | |
.collect::<Vec<BFieldElement>>(); | |
let digest = Digest::try_from(&digest_elements[0..DIGEST_LENGTH])?; | |
let digest_elements = serialized_proof | |
.program_digest | |
.iter() | |
.map(|v| BFieldElement::from(*v)) | |
.collect::<Vec<_>>(); | |
let digest = Digest::try_from(digest_elements)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, you can remove a lot of .iter().map(…)
ing by changing the types used in the SerializedProof
:
#[derive(Serialize, Deserialize, Debug, Clone)]
struct SerializedProof {
..
program_digest: Digest,
public_inputs: Vec<BFieldElement>,
public_outputs: Vec<BFieldElement>,
proof: Vec<BFieldElement>,
}
All of these types (de)serialize just fine. 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I applied the first change passing the full vector to try_from
. I thought about changing the types in SerializedProof
, but this makes the file structure vulnerable to change if Digest
or BFieldElement
changes. That's why i'm unwrapping/re-wrapping into primitive types.
Hey @jan-ferdinand thinking about this more it would be very useful if the triton-vm crate provided a standard way to serialize a proof. What is your opinion of pulling the serialization logic into triton-vm itself? |
I think this makes sense, yes. |
358c67f
to
6271c3d
Compare
This PR adds a new crate
triton-cli
. This crate exposes a cli with two commandsprove
: read asm from file, prove it, and write the resulting proof to fileverify
: read a proof from file, verify it, and print details about itProofs are serialized in a simple binary format. This format is described in the
main.rs
for the cli crate.From inside the
triton-cli
directory runcargo install --path .
to add thetriton-cli
to your path.Things to watch out for:
Closes #238