Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add eth_getRequiredBlockState method #455

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

perama-v
Copy link
Contributor

@perama-v perama-v commented Aug 8, 2023

Description

Introduces a method eth_getRequiredBlockState that returns all state required to execute a single historical block.

Changes made

  • New eth_getRequiredBlockState method that returns a RequiredBlockState data type as bytes.
  • Specification of RequiredBlocksState
    - Includes sections for motivation, ssz data format, algorithms and security.
  • Test cases for two separate blocks

Overview

An overview is provided below. Please see the more detailed sections in the PR.

Re-execution of historical transactions is important for accounting. A block is a program that was applied to the Ethereum
state at a particular moment in time. In retrospect, one can see which states were involved in that program.

These states can be inventoried, and accompanied by a merkle proof for that moment in time. The aggregation of all
the states that the block required, results in data that is both necessary and sufficient to re-execute that block.
This data is termed RequiredBlockState, and a specification is included in this PR.

Motivation

An archive node can be trustlessly distributed in a peer to peer network. Nodes that choose to implement eth_getRequiredBlockState provide a mechanism to "export a distributable archive node".

A side benefit is the potential for node providers to save bandwitdh costs to serve debug_traceTransaction functionality to users (2-4 orders of magnitude). The provider can serve eth_getRequiredBlockState, and the user re-executes the block locally. This may also help bootstrap a distributed content delivery network.

Data format

SSZ encoded structure with snappy compression, as is seen in the Ethereum consensus specification.

The data format has been tested and is approximately 167 KB/Mgas. If state for every historical block was created in this format, the total size is estimated to be ~30TB. Individuals would store a subset of this quantity. See more analysis here: https://github.com/perama-v/archors.

Algorithms

Descriptions for the creation and use of the data are included.

  • Creation: Call a node using existing JSON-RPC methods, aggregated, encode.
  • Use: Decode, verify and then trace block locally.

Test cases

The specification has been implemented as a library and CLI application that can call a node and construct RequiredBlockState for blocks. This was used to generate the test cases in this PR.

Test case generator: https://github.com/perama-v/archors/tree/main/bin/stator.

The archors library also contains examples showing the re-execution of a historical block using revm, the block and the RequiredBlockState.

Security - trustlessness

The cornerstone of this method is the assumptino that users (recipients of RequiredBlockState) can verify the canonicality of blockhashes. This may be achieved in two ways

  • Cryptographic accumulator (for execution headers), as seen in Portal network
  • Non-archive node. The addition of RequiredBlockState enables a non-archive node to selectively be an archive node for arbitrary specific blocks.

Security - node

An execution client is not required to implement eth_getRequiredBlockState to participate in the Ethereum protocol.

A node that does implement the method may choose to support the method for a subset of blocks (e.g., a non-archive node may support the method for the same range of blocks that it supports debug_traceBlock for).

An archive node that implements the method and supports all blocks must have access to the merkle trie at arbitrary heights. This is equivalent to supporting eth_getProof at arbitrary heights.

@perama-v perama-v marked this pull request as draft August 8, 2023 12:25
@perama-v perama-v marked this pull request as ready for review August 8, 2023 12:39
Copy link

@carver carver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds really cool, and it would be awesome to run traces at arbitrary history without running a local archive node!

Comment on lines +334 to +341
However, there is no guarantee to be able to compute a new block state root for this post-execution
state. For example, with the aim to check against the state root in the block header of that block
and thereby audit the state changes that were applied.

This is because the state changes may involve an arbitrary number of state deletions. State
deletions may change the structure of the merkle trie in a way that requires knowledge of
internal nodes that are not present in the proofs obtained by `eth_getProof` JSON-RPC method.
Hence, while the complete post-block trie can sometimes be created, it is not guaranteed.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I was worried about this.

Being able to consistently verify the post-state hash doesn't really feel optional to me. I can't trust the result of a local block execution that doesn't also prove that it got the same result as the network did. In the context of the Portal Network, nodes must validate data and only gossip data that we can prove locally.

EVM execution engines should be able to handle the case of missing trie nodes at the refactor, and returning the information needed to collect the data. They must handle the case, if they want to run against a partial trie database, like if they're running Beam Sync.

I am only familiar enough with py-evm to give the example:
https://github.com/ethereum/py-evm/blob/d751dc8c9c8199a16043a483b19c9f4d7a592202/eth/db/account.py#L605-L661

The MissingTrieNode exceptions are the relevant moments when the EVM realizes that it's missing some intermediate nodes that are required to prove the final state root, if you're only running with the state proof as defined in the current spec.

As a prototype, I suppose you could literally run py-evm with the unverified proofs, then retrieve the missing trie nodes over devp2p one by one (there usually aren't too many, from what I've seen).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.. then retrieve the missing trie nodes over devp2p one by one

For the missing trie node(s), I don't see a clear mechanism to obtain that node. For removed nodes that require knowledge of a sibling node, how could that sibling node be obtained? As it is mid-traversal, the terminal keys of the affected sibling are not trivially known. So eth_getProof(block_number, key_involving_sibling_node) cannot be used. A new method get_trie_node_at_block_height(block_number, trie_node_hash) could be written for a node, but this is nontrivial. More context here:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.. nodes must validate data and only gossip data that we can prove locally.

This is preserved (gossiped data is validated). The gossiped data is the merkle proofs of the state, so this is validated, and the block is also validated.

So the source of error here is a bug in the EVM implementation.

flowchart TD
    Block[Block, secured by header: Program to run] -- gossiped --> Pre
    State[State, secured by merkle proofs: Input to program] -- gossiped --> Pre
    Pre[Program with inputs] -- Load into EVM environment -->  EVM[EVM executes]
    EVM -- bug here --> Post[Post block state]
    Post -.-> PostRoot[Post-block state root useful for these bugs]
    EVM -- bug here --> Trace[debug_traceBlock output]
Loading

I agree EVM bugs are possible and having the post-state root would be nice.

However:

  • EVM impls are often shared across different client types, reducing bug risk.
  • Even having the post-block state does not guarantee that debug_traceBlock output is correct. It contains many
    details that are not covered by a post-block state root. So one still needs to check that the EVM doesn't have bugs.
  • Test suites can be run to compare debug_TraceBlock against the same result from an archive node (or different EVM implementation hooked into the portal network). This protects against EVM errors that result in bad state, and errors that result in bad traces.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I was worried about this.

Being able to consistently verify the post-state hash doesn't really feel optional to me. I can't trust the result of a local block execution that doesn't also prove that it got the same result as the network did. In the context of the Portal Network, nodes must validate data and only gossip data that we can prove locally.

EVM execution engines should be able to handle the case of missing trie nodes at the refactor, and returning the information needed to collect the data. They must handle the case, if they want to run against a partial trie database, like if they're running Beam Sync.

I am only familiar enough with py-evm to give the example: https://github.com/ethereum/py-evm/blob/d751dc8c9c8199a16043a483b19c9f4d7a592202/eth/db/account.py#L605-L661

The MissingTrieNode exceptions are the relevant moments when the EVM realizes that it's missing some intermediate nodes that are required to prove the final state root, if you're only running with the state proof as defined in the current spec.

As a prototype, I suppose you could literally run py-evm with the unverified proofs, then retrieve the missing trie nodes over devp2p one by one (there usually aren't too many, from what I've seen).

Ive proposed an idea here and believe our proposal on ZK proofs of the last state can indeed can help address the challenge mentioned. ZK proofs, specifically Zero-Knowledge Succinct Non-Interactive Argument of Knowledge (ZK-SNARKs), have the potential to provide proofs of complex computations and statements: sogolmalek/EIP-x#6

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... our proposal on ZK proofs of the last state can indeed can help address the challenge mentioned

Let me see if I understand your proposal. The data that a peer receives could consist of the:

  • (existing) The block pre-state (state required for the block, secured by merkle root in prior block and a header accumulator)
  • (existing) The block (to allow the user to replay the block for any purpose)
  • (new) A proof of block execution (ZK-EVM) consisting of a ZK-SNARK proof for the set of block post-state values (values that were accessed by the block).

The user re-executes the block and arrives at post-block state values. Those are compared to the values in the ZK proof. If they are the same, the chance that the EVM and the ZK-EVM both having a bug is unlikely and the state transition is likely sound.

So the ZK proof is equivalent to replaying the block using a different EVM implementation. That is, the same as getting the post-block state from a Rust implementation and comparing to the state produced by a Go implementation.

In this respect, the presence of the ZK proof does not seem to introduce additional safety. Perhaps I overlooked a component?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your detailed arguments. I'm just trying to make sure i understand points well and hope i didnt go far wrong with my following arguments. love to learn more here.

Our approach centers around the principle of minimizing data reliance while ensuring the accuracy and reliability of state transitions.While your understanding of our proposal is mostly accurate, there are a few points that could be considered when evaluating the introduced ZK proofs:
The process of re-executing the block on a different EVM implementation (Rust vs. Go, as you mentioned) can be cumbersome and resource-intensive.
another point is Attack Vectors and Determinism: While replaying the block using different EVM implementations is conceptually similar, it introduces the potential for inconsistencies between different implementations due to subtle differences in execution logic or determinism. furthermore, I believe ZK can ensure way more Efficiency and Scaling. Re-executing the block using different EVM implementations might be feasible on a small scale, but it becomes much more challenging as the network scales and the number of transactions increases. ZK proofs, on the other hand, can be generated and verified more efficiently, making them more suitable for large-scale systems.

As you've mentioned, the problem with key deletions is that sometimes sibling nodes in the trie are required to reconstruct the trie structure for verification purposes. These sibling nodes might not be available through the eth_getProof data, leading to incomplete trie reconstructions.

I think, a ZK proof of the last state involves generating a cryptographic proof that certifies the correctness of the entire state transition process, including deletions and modifications, can help. This proof can be designed to include information about sibling nodes that are required for verification. In essence, the ZK proof encapsulates the entire state transition, and by design, it must account for all the necessary data, including sibling nodes, to be valid.

Also By obtaining and validating a ZK proof of the last state, we can ensure that all required data for verifying the state transition, including missing sibling nodes, is included in the proof. This provides a complete and comprehensive validation mechanism that mitigates the challenge posed by missing nodes.

Copy link
Contributor Author

@perama-v perama-v Aug 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some additional context is that eth_getRequiredBlockState is designed to provide the minimum information required to re-execute an old block. The goal is to be able to download individual blocks and re-execute them in order to inspect every EVM step in that block. That is, the goal is to run the EVM (to gain historical insight).

With a ZK EVM, one can demonstrate to a peer that the block post-state is valid with respect to a block. This means that they do not have to re-execute the EVM themselves. This is beneficial for a light client that wants to trustlessly keep up to date with the latest Ethereum state. That is, the goal is to not run the EVM (to save resources).

Copy link
Contributor

@s1na s1na Sep 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For any key that has an inclusion proof in n - 1, and an exclusion proof in block n, retain the proof for that key.
4. Store these additional exclusion proofs in the RequiredBlockState data structure.

I think the idea is a good direction, but after discussing with @gballet I have the feeling sending the exclusion proofs themselves are not enough for the user to perform deletion locally. Because the sibling of the node being deleted could be a subtrie. So the server should use this approach (or another) to give the full prestate that is required for execution of the block, this means figuring out which extra nodes are needed to be returned for such a deletion.

Geth has the prestateTracer which I noticed suffers from the same issue after reading up on this ticket. I'd be keen to fix it. Nevermind, the prestateTracer doesn't actually return intermediary nodes, only accounts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At present this spec contains sufficient data to execute the block, but not update the proofs to get the post-block root.

For context, the main issue is that sometimes the trie depth is changed, which impacts cousin-nodes. More on this here: https://github.com/perama-v/archors/blob/main/crates/multiproof/README.md

To enable that, I have been prototyping a mechanism that allows this using only available JSON-RPC methods (eth_getProof) without hacking an execution client. This method calls get_proof on the subsequent block gets these edge-case nodes and tacks them into RequiredBlockState as "oracle data". The data allows the trie update to complete and the post block root compute, verifying that the oracle data was in fact correct.

I learned that @s1na has taken a different approach, which is to modify Geth to record all trie nodes touched during a block execution (including post-root computation). That method is sufficient to get trie nodes required to compute the post-block root (but requires modifying the execution client). It is a nice approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Essentially the oracle approach says: Look, some changes to the surrounding trie will happen, and they may be complex/extensive. One can "look in to the future" at this exact point in the traversal and just get the nodes for the affected part (leafy-end) of the traversal. The details of the surrounding trie don't need to be computed and aren't important because we can check that what we got from the future was in fact correct.

Bit of a hacky approach. I can see that this might be difficult to implement if not using eth_getProof.

@perama-v
Copy link
Contributor Author

After discussion with @s1na I have moved the trie nodes from all account and storage proofs to a single bag-of-nodes. This is sufficient for navigation (root hash plus path). This makes it easier for implementers because the proof structure does not need to be known to create the data structure.


A block hash accessed by the "BLOCKHASH" opcode.
```python
class RecentBlockHash(Container):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be made verifiable by including the whole header here.

Copy link
Contributor Author

@perama-v perama-v Oct 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you expand on what you mean here?

The assumption here is that a user has a mechanism to verify the canonicality of blockhashes.

More notes here: https://github.com/perama-v/archors#blockhash-opcode

One design goal is to keep RequiredBlockState as small as possible. This is because a static collection of all RequiredBlockState's would allow for a distributed archive node. The current estimate puts the size at 30TB.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my point addresses the following fault case you describe in your notes:

A block hash is wrong: the portal node can audit all block hashes against its master accumulator prior to tracing.

If we want to solve it at the RPC level: instead of sending only block number and block hash, send the whole block header. Then the verifier can hash the header (which contains the number) and be sure about the hash.

I understand the size concern. However if the clients have to audit an external source anyway it might be worth it. Otherwise if in most use-cases the client has easy access to latest 256 blocks then we can leave out this info from eth_getRequiredBlockState.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A "wrong" hash refers to a hash that is not canonical. So, the wrong hash could be a correctly computed hash of a noncanonical block. E.g. a fabricated block or an uncle block. Having the whole block included doesn't get one closer to verifying canonicality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants