-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fraud proof: Add Host function and update timestamp extrinsic derivation #2046
Conversation
Are you sure this is within our security model? Because I'm fairly certain we have a fundamental security assumption that there is at least one honest operator online at any time. And challenge period should be aligned with consensus chain in such a way that all challenges point to non-pruned state. |
Our assumption is for one honest operator exists in the network but that does not mean they are offline for a period of time. I'm mostly talking about the case when they are offline and comeback to submit a fraud proof during which consensus chain would have moved much forward than that domain itself.
Well, there is no guarantee that every consensus block will have a domain bundle that leads a new domain block. There can definitely be some gaps. Though I remember there was some plan to not have such a gap but that is not currently done Iirc |
That is really odd sounding to me. If they "exist", but "offline", they might as well be considered as non-existing 🤷♂️
Is challenge period not defined in consensus blocks? If not I think it is a problem and should be changed to be measured in consensus blocks, which I think addresses your comment. |
The challenge period of ER is measured in domain block, to my understanding there are several reasons:
As for the reason why we still need storage proof, I think it is because the non-archival node will prune the historical state, without storage proof only the consensus node with all the historical state of the challenge period can verify and include the fraud proof in their block (the challenge period can be days or weeks in the production environment), with storage proof any node have the block header can do that. |
The expectation is that there is an online honest operator that watches his domain at all times. When that is true, they will notice right away when some incorrect ER is submitted and have plenty of time.
This depends on our assumptions. It sounds like you assume an unbounded number of domains that are always producing bundles. If the subsequent assumption is that consensus chain can handle inclusion of all of those bundles, then it can handle all the fraud proofs as well.
Do we expect something like that? Follow-up question: can we have a different "consensus block" challenge period for different domains based on their requirements? (like L2s on Ethereum do, so some domain is ok with 10 min and some like Arbitrum want a week) We would need to keep storage proof in that case I think
With an unbounded number of domains and high demand this statement will be true even if domain blocks are used for challenge period. |
Yeah, it is true in theory and if nothing goes wrong a fraud proof should be submitted within a few seconds, but in practice the challenge period is sometime set to days or weeks since the hackers are incentivized to make things go wrong.
It is more about the inclusion of fraud proof, with consensus block we also need to take the network congestion into account for the challenge period, with domain block the challenge period is more about the time it take to notice the fraud and generate the fraud proof.
It is just an edge case, but consider a domain that takes 10 consensus blocks to execute its domain block, its challenge period is reduced by 9 consensus blocks compared to those domains that take 1 consensus blocks to execute, while the challenge period is really about the time to notice and generate fraud proof. For the question, Yes, different "consensus block" challenge period is feasible.
Not really, when there is high demand the bundle/ER may not be included in the consensus block at all, so the challenge period doesn't move forward, it is kind of a lazy approach. With consensus block, the challenge period of all the domains is moving forward as long as a consensus block is produced. |
73c2e39
to
945e9ab
Compare
Marked TODOs where needed to be fixed in the next commit
… the host function
945e9ab
to
cea9717
Compare
crates/pallet-domains/src/lib.rs
Outdated
let consensus_block_hash = bad_receipt.consensus_block_hash; | ||
let verification_info = get_invalid_domain_extrinsic_root_info( | ||
H256::from_slice(consensus_block_hash.as_ref()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use the parent hash of bad_receipt.consensus_block_hash
actually, PTAL at https://github.com/subspace/subspace/pull/1983/files#diff-53cbad878b65df79ef928e1937df046b3b0b38b8d135d7a067178918f68d0ac3R184-R188
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, that is not correct. We need to use the consensus_clock_hash
from which this ER is derived from so that Fraud proof verifier can also derive same things. For example, timestamp, block_randomness etc..
If you talking about the discussion we had on domain_runtime_code
runtime api returning upgraded runtime instead of correct runtime, that is out of scope for this PR and needs to be fixed in a different PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't check verification logic in detail, but the approach makes sense to me.
What I'd probably like to see is instead of get_invalid_domain_extrinsic_root_info
and many more to come, have just one with both input and output as enums, so we can extend it without changing host function API. I'm not sure how good idea it is, it seems awkward that we will be sending enum as an arguent one way and (potentially) get unexpected enum back, but it seems to be more flexible as we need to extend the supported variants. Thoughts?
/// Domain fraud proof related runtime interface | ||
#[runtime_interface] | ||
pub trait FraudProofRuntimeInterface { | ||
fn get_invalid_domain_extrinsic_root_info( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be description for such host functions. Also the name is odd, is it really about invalid domain extrinsic root or is it about any, it just happens to be used for invalid extrinsics by the runtime?
Actually, I like the idea of making this host function more generic that supports all fraud proof veriifcation and easier to add new types. Will update this |
consensus_block_hash: H256, | ||
domain_id: DomainId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure these are the only arguments that will be necessary for any kind of fraud proof. I'm thinking whether they should be moved under enum or deserve to be here.
/// Request type to fetch required verification information for fraud proof through Host function. | ||
#[derive(Debug, Decode, Encode, TypeInfo, PartialEq, Eq, Clone)] | ||
pub enum FraudProofVerificationInfoRequest { | ||
InvalidDomainExtrinsicRoot, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add docs for all enum variants on newly introduced data structures. And the name is odd, are you sure this is a request for invalid domain extrinsic root? Or do we request valid domain extrinsic root for the purposes of verifying an invalid one.
And for these data structures in sp-*
it would be preferable to specify SCALE codec indices explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the name is odd, are you sure this is a request for invalid domain extrinsic root? Or do we request valid domain extrinsic root for the purposes of verifying an invalid one.
I do not have a better name for this unfortunately.
We request the information used to verify the invalid domain extrinsic fraud proof through host function. The variant signifies the which fraud proof type we need to send the verification info back. The root derivation still happens on the runtime and this host function just uses the consensus state to return the verification information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I think the variant and its comment should indicate what information you're requesting from the client, not what fraud proof verification will use it. You may have multiple fraud proofs requesting the same data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I think the variant and its comment should indicate what information you're requesting from the client, not what fraud proof verification will use it. You may have multiple fraud proofs requesting the same data.
Updated the comment specific to the variant. I dont think any other fraud proof type would require this info apart from this Fraud proof type. Do you disagree ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is not the point, it was just a possibility. It is confusing if you don't call things what they are. From last update and comment it seems that maybe it should have already been two variants: for randomness and timestamp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be two host functions while this could simply with one call that returns the both and are used at the same time. Sure I agree that it makes the host calls more generic but returning these info bundled together would be nice though I may not be thinking much further ahead and I dont have strong opinion on this. If you think this makes it easier to understand and follow the code path, I willing to change though even though I wouldn't like it much :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can easily imagine randomness to be useful for some other fraud proofs for example. And going forward for other fraud proofs there might be overlapping pieces of information necessary. The approach where enum variants are saying what they request seems to scale more gracefully than variants that are purpose built for one fraud proof, even though it might be a bit more convenient to handle.
@NingLin-P @ParthDesai do you have an opinion on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Folks, I would really appreciate if we could move forward and make a decision on this so that we can actually start building the rest of the fraud proofs using the host function. I dont mind taking the route @nazar-pc suggested at this point but we really needs to get feedback and move on. @NingLin-P @ParthDesai
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also prefer @nazar-pc 's approach since there are overlapping data (i.e. domain runtime code, bundle) required by multiple fraud proof.
Besides we will need the host function to not only fetch data but also actually perform some of the verification steps (where domain runtime API is involved) do we also make them as a variant here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides we will need the host function to not only fetch data but also actually perform some of the verification steps (where domain runtime API is involved) do we also make them as a variant here?
I think so, everything that runtime can't do will go as request to host and back as response, we just need to ensure we are pricing it correctly.
/// Request type to fetch required verification information for fraud proof through Host function. | ||
#[derive(Debug, Decode, Encode, TypeInfo, PartialEq, Eq, Clone)] | ||
pub enum FraudProofVerificationInfoRequest { | ||
InvalidDomainExtrinsicRoot, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the name is odd, are you sure this is a request for invalid domain extrinsic root? Or do we request valid domain extrinsic root for the purposes of verifying an invalid one.
I do not have a better name for this unfortunately.
We request the information used to verify the invalid domain extrinsic fraud proof through host function. The variant signifies the which fraud proof type we need to send the verification info back. The root derivation still happens on the runtime and this host function just uses the consensus state to return the verification information.
&mut self, | ||
consensus_block_hash: H256, | ||
domain_id: DomainId, | ||
fraud_proof_verification_req: FraudProofVerificationInfoRequest, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From #2046 (comment)
These are minimal required info I believe is common for all fraud proof variants. Anything specific should be within the variant
d697e8e
to
d0be2a5
Compare
use std::sync::Arc; | ||
|
||
/// Trait to query and verify Domains Fraud proof. | ||
pub trait FraudProofHostFunctions: Send + Sync { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This host function expects the verifier node to contain all the necessary data for verifying the fraud proof, it won't be held if we still allow customizing the --state-pruning
and --block-pruning
options for the non-archival consensus node, I remember @nazar-pc had mentioned the archiving process also have the similar requirement, we should make that requirement more explicit or plz leave a TODO if it is not going handles in this PR.
NOTE: ---block-pruning
is necessary since we need the block body to get the bundle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For obvious reasons we will not allow it, it is not really supported on Gemini 3f already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How was it approached? I still see the --state-pruning
and --block-pruning
options in the gemini-3f-2023-oct-06
snapshot. Do we use a constant value for these options and ignore what the user pass in or make all consensus node to be archival nodes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, if you attempt to use it, things will break in a bad way. #1727 will address it by replacing Substrate CLI with our own CLI that will have more meaningful arguments (--farmer
and --operator
instead of --validator
) and arguments that are not supported/not applicable will be removed.
Block pruning right now is already dependent on archiving, so it will not prune recent blocks even if argument is set to do so. State pruning should be left as archival though, but I believe it should be pruned automatically when block is pruned (if not, I didn't check, then we need to make it so).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, but we may need the pruning dependent on the challenge period at least for nodes that turn on domain, in case archiving has a shorter requirement than the challenge period.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely. Practically though it is not a concern yet, blocks are pruned so much later that it will not cause issues for now.
d0be2a5
to
8a20edc
Compare
The PR should be updated to have individual host functions to fetch required info through host function. As for the naming, if you disagree with any, please also provide potential alternatives as I do not have enough brain power to come up with any new :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is an opportunity to generalize response error handling in the future, but otherwse approach makes sense to me.
This PR adds a new
sp-domains-fraud-proof
crate that adds a few host functions we need to verify the Fraud proofs. The host functions introduced in this PR are general enough and veriifcation logic is still part of the runtime and can be upgraded through a runtime upgrade.It turns out we cannot remove the storage proofs from the Fraud proof since the Consensus node, which verifies the fraud proof, will be just a Full node at the best. So it is not gauranted that required state such astimestamp
andblock_randomness
will exist. This is because while consensus node continues to produce blocks and prune older blocks, domain may be halted for some reason. There is a attack vector when if there is no honest operator in the network to submit the fraud proofs while the consensus state is available and they come back up after the state is pruned, verifier relaying on the state to fetchtimestamp
andblock_randomness
will fail and there by unable to verify the fraud proof. So we still need the storage proofs andpallet-domain
will store the required consensus state root through host function and use the same during the verification.Note:
sp-domains
still holds the verification logic of other fraud proof and that should ideally move to newsp-domains-fraud-proof
crate. Since that is a pure refactor, I will handle in a separate refactoring PR. This makes the changes in this PR easy to understand. We also can removesc-consensus-fraud-proof
crate in the refactoring PR as that would be useless going forward.Code contributor checklist: