Fraud proof: Add Host function and update timestamp extrinsic derivation #2046

vedhavyas · 2023-10-03T20:13:00Z

This PR adds a new sp-domains-fraud-proof crate that adds a few host functions we need to verify the Fraud proofs. The host functions introduced in this PR are general enough and veriifcation logic is still part of the runtime and can be upgraded through a runtime upgrade.

It turns out we cannot remove the storage proofs from the Fraud proof since the Consensus node, which verifies the fraud proof, will be just a Full node at the best. So it is not gauranted that required state such as timestamp and block_randomness will exist. This is because while consensus node continues to produce blocks and prune older blocks, domain may be halted for some reason. There is a attack vector when if there is no honest operator in the network to submit the fraud proofs while the consensus state is available and they come back up after the state is pruned, verifier relaying on the state to fetch timestamp and block_randomness will fail and there by unable to verify the fraud proof. So we still need the storage proofs and pallet-domain will store the required consensus state root through host function and use the same during the verification.

Note: sp-domains still holds the verification logic of other fraud proof and that should ideally move to new sp-domains-fraud-proof crate. Since that is a pure refactor, I will handle in a separate refactoring PR. This makes the changes in this PR easy to understand. We also can remove sc-consensus-fraud-proof crate in the refactoring PR as that would be useless going forward.

Code contributor checklist:

I have read, understood and followed contributing guide

nazar-pc · 2023-10-04T07:04:13Z

There is a attack vector when if there is no honest operator in the network to submit the fraud proofs while the consensus state is available

Are you sure this is within our security model? Because I'm fairly certain we have a fundamental security assumption that there is at least one honest operator online at any time. And challenge period should be aligned with consensus chain in such a way that all challenges point to non-pruned state.

vedhavyas · 2023-10-04T07:28:13Z

Our assumption is for one honest operator exists in the network but that does not mean they are offline for a period of time. I'm mostly talking about the case when they are offline and comeback to submit a fraud proof during which consensus chain would have moved much forward than that domain itself.
Well maybe that would not be the case at all but this is certainly possible. That is one reason why we still store a list consensus block hashes as part of the pallet-domains state.

And challenge period should be aligned with consensus chain in such a way that all challenges point to non-pruned state.

Well, there is no guarantee that every consensus block will have a domain bundle that leads a new domain block. There can definitely be some gaps. Though I remember there was some plan to not have such a gap but that is not currently done Iirc

nazar-pc · 2023-10-04T07:32:45Z

Our assumption is for one honest operator exists in the network but that does not mean they are offline for a period of time. I'm mostly talking about the case when they are offline and comeback to submit a fraud proof during which consensus chain would have moved much forward than that domain itself.

That is really odd sounding to me. If they "exist", but "offline", they might as well be considered as non-existing 🤷‍♂️

Well, there is no guarantee that every consensus block will have a domain bundle that leads a new domain block. There can definitely be some gaps. Though I remember there was some plan to not have such a gap but that is not currently done Iirc

Is challenge period not defined in consensus blocks? If not I think it is a problem and should be changed to be measured in consensus blocks, which I think addresses your comment.

NingLin-P · 2023-10-05T08:09:25Z

Is challenge period not defined in consensus blocks? If not I think it is a problem and should be changed to be measured in consensus blocks, which I think addressed your comment.

The challenge period of ER is measured in domain block, to my understanding there are several reasons:

When using consensus block, for fraudulent ER submitted in block #N, the fraud proof must be included before block #N + challenge_period, all the domain instances are racing to submit their fraud proof within this window while there are only limited number of fraud proof can be included in challenge_period number of consensus block, some proof may miss it and break the security assumption. When using domain block, the domain block only produces if there is a bundle included in the consensus block, and fraud proof has a higher priority to be included than bundle, so it is okay for fraud proof/bundle to miss (not be included in) some consensus blocks.
Different domains may have different workload, which means they may have different bundle/block intervals, using consensus block forces all domains to use the same challenge period. For extreme domains, it may take longer time than the fixed challenge period to execute their block and hence produce their ER, so malicious operators can craft a fraudulent ER (without executing the block) and get it pass the challenge period before other operators can submit fraud proof (which need their local ER to be produced first)
Using consensus block also means the challenge period needs to be moving forward for every domain when a consensus block is initialized, which may cause a huge amount of weight and exceed the max block weight limit since the number of domains is unbounded. With domain block, the challenge period moves forward when a new ER is submitted (on the submit_bundle call) so weight is bounded and accountable

As for the reason why we still need storage proof, I think it is because the non-archival node will prune the historical state, without storage proof only the consensus node with all the historical state of the challenge period can verify and include the fraud proof in their block (the challenge period can be days or weeks in the production environment), with storage proof any node have the block header can do that.

dariolina · 2023-10-05T08:34:56Z

The expectation is that there is an online honest operator that watches his domain at all times. When that is true, they will notice right away when some incorrect ER is submitted and have plenty of time.

all the domain instances are racing to submit their fraud proof within this window while there are only limited number of fraud proof can be included in challenge_period number of consensus block

This depends on our assumptions. It sounds like you assume an unbounded number of domains that are always producing bundles. If the subsequent assumption is that consensus chain can handle inclusion of all of those bundles, then it can handle all the fraud proofs as well.

For extreme domains, it may take longer time than the fixed challenge period to execute their block and hence produce their ER

Do we expect something like that? Follow-up question: can we have a different "consensus block" challenge period for different domains based on their requirements? (like L2s on Ethereum do, so some domain is ok with 10 min and some like Arbitrum want a week) We would need to keep storage proof in that case I think

Using consensus block also means the challenge period needs to be moving forward for every domain when a consensus block is initialized

With an unbounded number of domains and high demand this statement will be true even if domain blocks are used for challenge period.

NingLin-P · 2023-10-05T09:25:43Z

The expectation is that there is an online honest operator that watches his domain at all times. When that is true, they will notice right away when some incorrect ER is submitted and have plenty of time.

Yeah, it is true in theory and if nothing goes wrong a fraud proof should be submitted within a few seconds, but in practice the challenge period is sometime set to days or weeks since the hackers are incentivized to make things go wrong.

It sounds like you assume an unbounded number of domains that are always producing bundles. If the subsequent assumption is that consensus chain can handle inclusion of all of those bundles, then it can handle all the fraud proofs as well.

It is more about the inclusion of fraud proof, with consensus block we also need to take the network congestion into account for the challenge period, with domain block the challenge period is more about the time it take to notice the fraud and generate the fraud proof.

Do we expect something like that? Follow-up question: can we have a different "consensus block" challenge period for different domains based on their requirements?

It is just an edge case, but consider a domain that takes 10 consensus blocks to execute its domain block, its challenge period is reduced by 9 consensus blocks compared to those domains that take 1 consensus blocks to execute, while the challenge period is really about the time to notice and generate fraud proof. For the question, Yes, different "consensus block" challenge period is feasible.

With an unbounded number of domains and high demand this statement will be true even if domain blocks are used for challenge period.

Not really, when there is high demand the bundle/ER may not be included in the consensus block at all, so the challenge period doesn't move forward, it is kind of a lazy approach. With consensus block, the challenge period of all the domains is moving forward as long as a consensus block is produced.

Marked TODOs where needed to be fixed in the next commit

… the host function

crates/pallet-domains/src/lib.rs

NingLin-P · 2023-10-09T14:25:37Z

crates/pallet-domains/src/lib.rs

+                let consensus_block_hash = bad_receipt.consensus_block_hash;
+                let verification_info = get_invalid_domain_extrinsic_root_info(
+                    H256::from_slice(consensus_block_hash.as_ref()),


We should use the parent hash of bad_receipt.consensus_block_hash actually, PTAL at https://github.com/subspace/subspace/pull/1983/files#diff-53cbad878b65df79ef928e1937df046b3b0b38b8d135d7a067178918f68d0ac3R184-R188

Nope, that is not correct. We need to use the consensus_clock_hash from which this ER is derived from so that Fraud proof verifier can also derive same things. For example, timestamp, block_randomness etc..

If you talking about the discussion we had on domain_runtime_code runtime api returning upgraded runtime instead of correct runtime, that is out of scope for this PR and needs to be fixed in a different PR

nazar-pc

Didn't check verification logic in detail, but the approach makes sense to me.

What I'd probably like to see is instead of get_invalid_domain_extrinsic_root_info and many more to come, have just one with both input and output as enums, so we can extend it without changing host function API. I'm not sure how good idea it is, it seems awkward that we will be sending enum as an arguent one way and (potentially) get unexpected enum back, but it seems to be more flexible as we need to extend the supported variants. Thoughts?

crates/pallet-domains/src/tests.rs

nazar-pc · 2023-10-10T09:40:16Z

crates/sp-domains-fraud-proof/src/runtime_interface.rs

+/// Domain fraud proof related runtime interface
+#[runtime_interface]
+pub trait FraudProofRuntimeInterface {
+    fn get_invalid_domain_extrinsic_root_info(


There should be description for such host functions. Also the name is odd, is it really about invalid domain extrinsic root or is it about any, it just happens to be used for invalid extrinsics by the runtime?

vedhavyas · 2023-10-10T09:54:54Z

What I'd probably like to see is instead of get_invalid_domain_extrinsic_root_info and many more to come, have just one with both input and output as enums, so we can extend it without changing host function API. I'm not sure how good idea it is, it seems awkward that we will be sending enum as an arguent one way and (potentially) get unexpected enum back, but it seems to be more flexible as we need to extend the supported variants. Thoughts?

Actually, I like the idea of making this host function more generic that supports all fraud proof veriifcation and easier to add new types. Will update this

nazar-pc · 2023-10-10T10:30:39Z

crates/sp-domains-fraud-proof/src/runtime_interface.rs

        consensus_block_hash: H256,
        domain_id: DomainId,


Are you sure these are the only arguments that will be necessary for any kind of fraud proof. I'm thinking whether they should be moved under enum or deserve to be here.

nazar-pc · 2023-10-10T10:32:35Z

crates/sp-domains-fraud-proof/src/lib.rs

+/// Request type to fetch required verification information for fraud proof through Host function.
+#[derive(Debug, Decode, Encode, TypeInfo, PartialEq, Eq, Clone)]
+pub enum FraudProofVerificationInfoRequest {
+    InvalidDomainExtrinsicRoot,


Please add docs for all enum variants on newly introduced data structures. And the name is odd, are you sure this is a request for invalid domain extrinsic root? Or do we request valid domain extrinsic root for the purposes of verifying an invalid one.

And for these data structures in sp-* it would be preferable to specify SCALE codec indices explicitly.

And the name is odd, are you sure this is a request for invalid domain extrinsic root? Or do we request valid domain extrinsic root for the purposes of verifying an invalid one.

I do not have a better name for this unfortunately.
We request the information used to verify the invalid domain extrinsic fraud proof through host function. The variant signifies the which fraud proof type we need to send the verification info back. The root derivation still happens on the runtime and this host function just uses the consensus state to return the verification information.

Hm, I think the variant and its comment should indicate what information you're requesting from the client, not what fraud proof verification will use it. You may have multiple fraud proofs requesting the same data.

Hm, I think the variant and its comment should indicate what information you're requesting from the client, not what fraud proof verification will use it. You may have multiple fraud proofs requesting the same data.

Updated the comment specific to the variant. I dont think any other fraud proof type would require this info apart from this Fraud proof type. Do you disagree ?

That is not the point, it was just a possibility. It is confusing if you don't call things what they are. From last update and comment it seems that maybe it should have already been two variants: for randomness and timestamp.

That would be two host functions while this could simply with one call that returns the both and are used at the same time. Sure I agree that it makes the host calls more generic but returning these info bundled together would be nice though I may not be thinking much further ahead and I dont have strong opinion on this. If you think this makes it easier to understand and follow the code path, I willing to change though even though I wouldn't like it much :)

I can easily imagine randomness to be useful for some other fraud proofs for example. And going forward for other fraud proofs there might be overlapping pieces of information necessary. The approach where enum variants are saying what they request seems to scale more gracefully than variants that are purpose built for one fraud proof, even though it might be a bit more convenient to handle.

@NingLin-P @ParthDesai do you have an opinion on this?

Folks, I would really appreciate if we could move forward and make a decision on this so that we can actually start building the rest of the fraud proofs using the host function. I dont mind taking the route @nazar-pc suggested at this point but we really needs to get feedback and move on. @NingLin-P @ParthDesai

I would also prefer @nazar-pc 's approach since there are overlapping data (i.e. domain runtime code, bundle) required by multiple fraud proof.

Besides we will need the host function to not only fetch data but also actually perform some of the verification steps (where domain runtime API is involved) do we also make them as a variant here?

Besides we will need the host function to not only fetch data but also actually perform some of the verification steps (where domain runtime API is involved) do we also make them as a variant here?

I think so, everything that runtime can't do will go as request to host and back as response, we just need to ensure we are pricing it correctly.

vedhavyas · 2023-10-10T10:36:53Z

crates/sp-domains-fraud-proof/src/lib.rs

+/// Request type to fetch required verification information for fraud proof through Host function.
+#[derive(Debug, Decode, Encode, TypeInfo, PartialEq, Eq, Clone)]
+pub enum FraudProofVerificationInfoRequest {
+    InvalidDomainExtrinsicRoot,


And the name is odd, are you sure this is a request for invalid domain extrinsic root? Or do we request valid domain extrinsic root for the purposes of verifying an invalid one.

I do not have a better name for this unfortunately.
We request the information used to verify the invalid domain extrinsic fraud proof through host function. The variant signifies the which fraud proof type we need to send the verification info back. The root derivation still happens on the runtime and this host function just uses the consensus state to return the verification information.

vedhavyas · 2023-10-10T10:40:01Z

crates/sp-domains-fraud-proof/src/runtime_interface.rs

+        &mut self,
+        consensus_block_hash: H256,
+        domain_id: DomainId,
+        fraud_proof_verification_req: FraudProofVerificationInfoRequest,


From #2046 (comment)

These are minimal required info I believe is common for all fraud proof variants. Anything specific should be within the variant

NingLin-P · 2023-10-10T11:34:01Z

crates/sp-domains-fraud-proof/src/host_functions.rs

+use std::sync::Arc;
+
+/// Trait to query and verify Domains Fraud proof.
+pub trait FraudProofHostFunctions: Send + Sync {


This host function expects the verifier node to contain all the necessary data for verifying the fraud proof, it won't be held if we still allow customizing the --state-pruning and --block-pruning options for the non-archival consensus node, I remember @nazar-pc had mentioned the archiving process also have the similar requirement, we should make that requirement more explicit or plz leave a TODO if it is not going handles in this PR.

NOTE: ---block-pruning is necessary since we need the block body to get the bundle.

For obvious reasons we will not allow it, it is not really supported on Gemini 3f already.

How was it approached? I still see the --state-pruning and --block-pruning options in the gemini-3f-2023-oct-06 snapshot. Do we use a constant value for these options and ignore what the user pass in or make all consensus node to be archival nodes?

Well, if you attempt to use it, things will break in a bad way. #1727 will address it by replacing Substrate CLI with our own CLI that will have more meaningful arguments (--farmer and --operator instead of --validator) and arguments that are not supported/not applicable will be removed.

Block pruning right now is already dependent on archiving, so it will not prune recent blocks even if argument is set to do so. State pruning should be left as archival though, but I believe it should be pruned automatically when block is pruned (if not, I didn't check, then we need to make it so).

I see, but we may need the pruning dependent on the challenge period at least for nodes that turn on domain, in case archiving has a shorter requirement than the challenge period.

Absolutely. Practically though it is not a concern yet, blocks are pruned so much later that it will not cause issues for now.

vedhavyas · 2023-10-10T16:29:40Z

The PR should be updated to have individual host functions to fetch required info through host function.

As for the naming, if you disagree with any, please also provide potential alternatives as I do not have enough brain power to come up with any new :)

nazar-pc

I think there is an opportunity to generalize response error handling in the future, but otherwse approach makes sense to me.

vedhavyas requested review from NingLin-P, nazar-pc and rg3l3dr as code owners October 3, 2023 20:13

vedhavyas force-pushed the fraud_proof/host_function branch from 73c2e39 to 945e9ab Compare October 9, 2023 13:57

vedhavyas added 3 commits October 9, 2023 15:59

define fraud proofs primitives crate and host functions

a1a83a9

remove domain state root inherent extrinsic.

68f1cf8

Marked TODOs where needed to be fixed in the next commit

verify invalid domain extrinsic root using the verification info from…

cea9717

… the host function

vedhavyas force-pushed the fraud_proof/host_function branch from 945e9ab to cea9717 Compare October 9, 2023 14:36

NingLin-P reviewed Oct 9, 2023

View reviewed changes

nazar-pc reviewed Oct 10, 2023

View reviewed changes

vedhavyas requested review from nazar-pc and NingLin-P October 10, 2023 10:24

nazar-pc reviewed Oct 10, 2023

View reviewed changes

vedhavyas commented Oct 10, 2023

View reviewed changes

vedhavyas force-pushed the fraud_proof/host_function branch from d697e8e to d0be2a5 Compare October 10, 2023 10:43

NingLin-P reviewed Oct 10, 2023

View reviewed changes

apply review suggestions

8a20edc

vedhavyas force-pushed the fraud_proof/host_function branch from d0be2a5 to 8a20edc Compare October 10, 2023 16:27

vedhavyas requested a review from nazar-pc October 10, 2023 16:29

nazar-pc approved these changes Oct 10, 2023

View reviewed changes

vedhavyas enabled auto-merge October 10, 2023 16:33

vedhavyas added this pull request to the merge queue Oct 10, 2023

Merged via the queue into main with commit f567ca7 Oct 10, 2023
10 checks passed

vedhavyas deleted the fraud_proof/host_function branch October 10, 2023 18:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fraud proof: Add Host function and update timestamp extrinsic derivation #2046

Fraud proof: Add Host function and update timestamp extrinsic derivation #2046

vedhavyas commented Oct 3, 2023 •

edited

Loading

nazar-pc commented Oct 4, 2023

vedhavyas commented Oct 4, 2023

nazar-pc commented Oct 4, 2023 •

edited

Loading

NingLin-P commented Oct 5, 2023

dariolina commented Oct 5, 2023 •

edited

Loading

NingLin-P commented Oct 5, 2023

NingLin-P Oct 9, 2023

vedhavyas Oct 9, 2023

nazar-pc left a comment

nazar-pc Oct 10, 2023

vedhavyas commented Oct 10, 2023

nazar-pc Oct 10, 2023

nazar-pc Oct 10, 2023

vedhavyas Oct 10, 2023

nazar-pc Oct 10, 2023

vedhavyas Oct 10, 2023 •

edited

Loading

nazar-pc Oct 10, 2023

vedhavyas Oct 10, 2023

nazar-pc Oct 10, 2023

vedhavyas Oct 10, 2023

NingLin-P Oct 10, 2023

nazar-pc Oct 10, 2023

vedhavyas Oct 10, 2023

vedhavyas Oct 10, 2023

NingLin-P Oct 10, 2023

nazar-pc Oct 10, 2023

NingLin-P Oct 10, 2023

nazar-pc Oct 10, 2023 •

edited

Loading

NingLin-P Oct 10, 2023

nazar-pc Oct 10, 2023

vedhavyas commented Oct 10, 2023

nazar-pc left a comment

Fraud proof: Add Host function and update timestamp extrinsic derivation #2046

Fraud proof: Add Host function and update timestamp extrinsic derivation #2046

Conversation

vedhavyas commented Oct 3, 2023 • edited Loading

Code contributor checklist:

nazar-pc commented Oct 4, 2023

vedhavyas commented Oct 4, 2023

nazar-pc commented Oct 4, 2023 • edited Loading

NingLin-P commented Oct 5, 2023

dariolina commented Oct 5, 2023 • edited Loading

NingLin-P commented Oct 5, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nazar-pc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vedhavyas commented Oct 10, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vedhavyas Oct 10, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nazar-pc Oct 10, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vedhavyas commented Oct 10, 2023

nazar-pc left a comment

Choose a reason for hiding this comment

vedhavyas commented Oct 3, 2023 •

edited

Loading

nazar-pc commented Oct 4, 2023 •

edited

Loading

dariolina commented Oct 5, 2023 •

edited

Loading

vedhavyas Oct 10, 2023 •

edited

Loading

nazar-pc Oct 10, 2023 •

edited

Loading