-
Notifications
You must be signed in to change notification settings - Fork 377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add eth_getRequiredBlockState
method
#455
base: main
Are you sure you want to change the base?
Changes from all commits
5a9bccb
bb9b5c8
72869ce
eb9d698
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,335 @@ | ||
## `RequiredBlockState` specification | ||
|
||
Specification of a data format that contains state required to | ||
trace a single Ethereum block. | ||
|
||
This is the format of the data returned by the `eth_getRequiredBlockState` JSON-RPC method. | ||
|
||
## Table of Contents | ||
|
||
- [`RequiredBlockState` specification](#requiredblockstate-specification) | ||
- [Table of Contents](#table-of-contents) | ||
- [Abstract](#abstract) | ||
- [Motivation](#motivation) | ||
- [Overview](#overview) | ||
- [General Structure](#general-structure) | ||
- [Notation](#notation) | ||
- [Endianness](#endianness) | ||
- [Constants](#constants) | ||
- [Variable-size type parameters](#variable-size-type-parameters) | ||
- [Definitions](#definitions) | ||
- [`RequiredBlockState`](#requiredblockstate) | ||
- [`CompactEip1186Proof`](#compacteip1186proof) | ||
- [`Contract`](#contract) | ||
- [`TrieNode`](#trienode) | ||
- [`RecentBlockHash`](#recentblockhash) | ||
- [`CompactStorageProof`](#compactstorageproof) | ||
- [Algorithms](#algorithms) | ||
- [`construct_required_block_state`](#construct_required_block_state) | ||
- [`get_state_accesses`](#get_state_accesses) | ||
- [`get_proofs`](#get_proofs) | ||
- [`get_block_hashes`](#get_block_hashes) | ||
- [`use_required_block_state`](#use_required_block_state) | ||
- [`verify_required_block_state`](#verify_required_block_state) | ||
- [`trace_block_locally`](#trace_block_locally) | ||
- [`compression_procedure`](#compression_procedure) | ||
- [Security](#security) | ||
- [Future protocol changes](#future-protocol-changes) | ||
- [Canonicality](#canonicality) | ||
- [Post-block state root](#post-block-state-root) | ||
|
||
|
||
## Abstract | ||
|
||
An Ethereum block returned by `eth_getBlockByNumber` can be considered a program that executes | ||
a state transition. The input to that program is the state immediately prior to that block. | ||
Only a small part of that state is required to run the program (re-execute the block). | ||
The state values can be accompanied by merkle proofs to prevent tampering. | ||
|
||
The specification of that state (values and proofs as `RequiredBlockState`) facilitates | ||
data transfer between two parties. The transfer represents the minimum amount of data | ||
required for the holder of an Ethereum block to re-execute that block. | ||
|
||
Re-execution is required for basic accounting (examination of the history of the global | ||
shared ledger). Trustless accounting of single Ethereum blocks allows for lightweight | ||
distributed block exploration. | ||
|
||
|
||
## Motivation | ||
|
||
State is rooted in the header. A merkle multiproof for all state required for all | ||
transactions in one block enables is sufficient to trace any historical block. | ||
|
||
In addition to the proof, BLOCKHASH opcode reads are also included. | ||
|
||
Together, anyone with an ability to verify that a historical block header is canonical | ||
can trustlessly trace a block without posession of an archive node. | ||
|
||
The format of the data is deterministic, so that two peers creating the same | ||
data will produce identical structures. | ||
|
||
The primary motivation is that data may be distributed in a peer-to-peer content delivery network. | ||
This would represent the state for a sharded archive node, where users may host subsets of the | ||
data useful to them. | ||
|
||
A secondary benefit is that traditional node providers could serve users the ability to | ||
re-execute a block, rather than provide the result of re-execution. Transfer | ||
of `RequiredBlockState` is approximately 167kb/Mgas (~2.5MB per block). Transfer of | ||
a `debug_TraceBlock` result is on the order of hundreds of megabytes per block with memory | ||
disabled, and with memory enabled can be tens of gigabytes. Local re-execution with an EVM | ||
implementation of choice can produce the identical re-execution (including memory or custom | ||
tracers), and can be processed and discarded on the fly. | ||
|
||
## Overview | ||
|
||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", | ||
"RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted | ||
as described in RFC 2119 and RFC 8174. | ||
|
||
### General Structure | ||
|
||
The `RequiredBlockState` consists of account state values as Merkle proofs, contract bytecode | ||
and recent block hashes. | ||
|
||
### Notation | ||
Code snippets appearing in `this style` are to be interpreted as Python 3 pseudocode. The | ||
style of the document is intended to be readable by those familiar with the | ||
Ethereum consensus [https://github.com/ethereum/consensus-specs](https://github.com/ethereum/consensus-specs) | ||
and Simple Serialize (SSZ) ([https://github.com/ethereum/consensus-specs/blob/dev/ssz/simple-serialize.md](https://github.com/ethereum/consensus-specs/blob/dev/ssz/simple-serialize.md)) | ||
specifications. | ||
|
||
Where a list/vector is said to be sorted, it indicates that the elements are ordered | ||
lexicographically when in hexadecimal representation (e.g., `[0x12, 0x3e, 0xe3]`) prior | ||
to conversion to ssz format. For elements that are containers, the ordering is determined by | ||
the first element in the container. | ||
|
||
### Endianness | ||
|
||
Big endian form is used as most data relates to the Ethereum execution context. | ||
|
||
## Constants | ||
|
||
### Variable-size type parameters | ||
|
||
Helper values for SSZ operations. SSZ variable-size elements require a maximum length field. | ||
|
||
Most values are chosen to be the approximately the smallest possible value. | ||
|
||
| Name | Value | Description | | ||
| - | - | - | | ||
| MAX_BLOCKHASH_READS_PER_BLOCK | uint16(256) | A BLOCKHASH opcode may read up to 256 recent blocks | | ||
| MAX_BYTES_PER_NODE | uint16(32768) | - | | ||
| MAX_BYTES_PER_CONTRACT | uint16(32768) | - | | ||
| MAX_CONTRACTS_PER_BLOCK | uint16(2048) | - | | ||
| MAX_NODES_PER_BLOCK | uint16(32768) | - | | ||
| MAX_ACCOUNT_PROOFS_PER_BLOCK | uint16(8192) | - | | ||
| MAX_STORAGE_PROOFS_PER_ACCOUNT | uint16(8192) | - | | ||
|
||
## Definitions | ||
|
||
### `RequiredBlockState` | ||
|
||
The entire `RequiredBlockState` data format is represented by the following (SSZ-encoded and | ||
snappy-compressed) container. | ||
|
||
All trie nodes (account and storage) are aggregated for deduplication and simplicity. | ||
They are located in the `trie_nodes` members. | ||
A "compact" proof consists only of the root hash for that trie and the information required | ||
for computing the trie path. The trie nodes can then be traversed by locating the first | ||
node using the root hash and starting the traversal. | ||
|
||
The proof data represents values in the historical chain immediately prior to the execution of | ||
the block (sometimes referred to as "prestate"). That is, `RequiredBlockState` for block `n` | ||
contains proofs rooted in the state root of block `n - 1`. | ||
|
||
```python | ||
class RequiredBlockState(Container): | ||
#sorted (by address) | ||
compact_eip1186_proofs: List[CompactEip1186Proof, MAX_ACCOUNT_PROOFS_PER_BLOCK] | ||
#sorted | ||
contracts: List[Contract, MAX_CONTRACTS_PER_BLOCK] | ||
#sorted | ||
trie_nodes: List[TrieNode, MAX_NODES_PER_BLOCK] | ||
#sorted (by block number) | ||
block_hashes: List[RecentBlockHash, MAX_BLOCKHASH_READS_PER_BLOCK] | ||
``` | ||
The `RequiredBlockState` is compressed using snappy encoding (see algorithms section). The | ||
`eth_getRequiredBlockState` JSON-RPC method returns the SSZ-encoded container with snappy encoding. | ||
|
||
### `CompactEip1186Proof` | ||
|
||
Represents the proof data whose root is the state root in the block header of the preceeding block. | ||
|
||
The account proof is obtained by calculating the account hash and traversing nodes in the | ||
`RequiredBlockState` container. | ||
|
||
```python | ||
class CompactEip1186Proof(Container): | ||
address: Vector[uint8, 20] | ||
balance: List[uint8, 32] | ||
code_hash: Vector[uint8, 32] | ||
nonce: List[uint8, 8] | ||
storage_hash: Vector[uint8, 32] | ||
#sorted | ||
storage_proofs: List[CompactStorageProof, MAX_STORAGE_PROOFS_PER_ACCOUNT] | ||
``` | ||
|
||
### `Contract` | ||
|
||
An alias for contract bytecode. | ||
```python | ||
Contract = List[uint8, MAX_BYTES_PER_CONTRACT] | ||
``` | ||
|
||
### `TrieNode` | ||
|
||
An alias for a node in a merkle patricia proof. | ||
|
||
Merkle Patricia Trie proofs consist of a list of witness nodes that correspond to each trie node that consists of various data elements depending on the type of node (e.g. blank, branch, extension, leaf). When serialized, each witness node is represented as an RLP serialized list of the component elements. | ||
|
||
```python | ||
TrieNode = List[uint8, MAX_BYTES_PER_NODE] | ||
``` | ||
|
||
### `RecentBlockHash` | ||
|
||
A block hash accessed by the "BLOCKHASH" opcode. | ||
```python | ||
class RecentBlockHash(Container): | ||
block_number: List[uint8, 8] | ||
block_hash: Vector[uint8, 32] | ||
``` | ||
|
||
### `CompactStorageProof` | ||
|
||
The account proof is obtained by calculating the hash of the key and traversing nodes in the | ||
`RequiredBlockState` container. | ||
|
||
```python | ||
class CompactStorageProof(Container): | ||
key: Vector[uint8, 32] | ||
value: List[uint8, 8] | ||
``` | ||
|
||
## Algorithms | ||
|
||
This section contains descriptions of procedures relevant to `RequiredBlockState`, including their | ||
production (`construct_required_block_state`) and use (`use_required_block_state`). | ||
|
||
### `construct_required_block_state` | ||
|
||
For a given block, `RequiredBlockState` can be constructed using existing JSON-RPC methods by | ||
using the following algorithms/steps: | ||
1. `get_state_accesses` algorithm | ||
2. `get_proofs` | ||
3. `get_block_hashes` | ||
4. Create the `RequiredBlockState` SSZ container | ||
5. Use `compression_procedure` to compress the `RequiredBlockState` | ||
|
||
### `get_state_accesses` | ||
|
||
Call `debug_TraceBlock` with the prestate tracer, record key/value pairs where | ||
they are first encountered in the block. | ||
|
||
``` | ||
curl -X POST -H "Content-Type: application/json" --data '{"jsonrpc": "2.0", "method": "debug_traceBlock", "params": ["finalized", {"tracer": "prestateTracer"}], "id":1}' http://127.0.0.1:8545 | jq | ||
``` | ||
This will return state objects consisting of a key (account address), and value (state, which | ||
may include contract bytecode and storage key/value pairs). See two objects for reference: | ||
```json | ||
[ | ||
"0x58803db3cc22e8b1562c332494da49cacd94c6ab": { | ||
"balance": "0x13befe42b38a40", | ||
"nonce": 54 | ||
}, | ||
"0xae7ab96520de3a18e5e111b5eaab095312d7fe84": { | ||
"balance": "0x4558214a60e751c3a", | ||
"code": "0x608060/* Snip (entire contract bytecode) */410029", | ||
"nonce": 1, | ||
"storage": { | ||
"0x1b6078aebb015f6e4f96e70b5cfaec7393b4f2cdf5b66fb81b586e48bf1f4a26": "0x0000000000000000000000000000000000000000000000000000000000000000", | ||
"0x4172f0f7d2289153072b0a6ca36959e0cbe2efc3afe50fc81636caa96338137b": "0x000000000000000000000000b8ffc3cd6e7cf5a098a1c92f48009765b24088dc", | ||
"0x644132c4ddd5bb6f0655d5fe2870dcec7870e6be4758890f366b83441f9fdece": "0x0000000000000000000000000000000000000000000000000000000000000001", | ||
"0xd625496217aa6a3453eecb9c3489dc5a53e6c67b444329ea2b2cbc9ff547639b": "0x3ca7c3e38968823ccb4c78ea688df41356f182ae1d159e4ee608d30d68cef320" | ||
} | ||
}, | ||
... | ||
] | ||
``` | ||
|
||
### `get_proofs` | ||
|
||
Call the `eth_getProof` JSON-RPC method for each state key (address) returned by the | ||
`get_state_accesses` algorithm, including | ||
storage keys if appropriate. | ||
|
||
The block number used is the block prior to the block of interest (state is stored as post-block | ||
state). | ||
|
||
For all account proofs, aggregate and sort the proof nodes and represent each proof as a list of | ||
indices to those nodes. Repeat for all storage proofs. | ||
|
||
### `get_block_hashes` | ||
|
||
Call `debug_TraceBlock` with the default tracer, record any use of the "BLOCKHASH" opcode. | ||
Record the block number (top of stack in the "BLOCKHASH" step), and the block hash (top | ||
of stack in the subsequent step). | ||
|
||
### `use_required_block_state` | ||
|
||
1. Obtain `RequiredBlockState`, for example by calling `eth_getRequiredBlockState` | ||
2. Use `compression_procedure` to decompress the `RequiredBlockState` | ||
3. `verify_required_block_state` | ||
4. `trace_block_locally` | ||
|
||
### `verify_required_block_state` | ||
|
||
Check block hashes are canonical such as a node or against an accumulator of canonical | ||
block hashes. Check merkle proofs in the required block state. | ||
|
||
### `trace_block_locally` | ||
|
||
Obtain a block (`eth_getBlockByNumber` JSON-RPC method) with transaction bodies. Use an EVM | ||
and load it with the `RequiredBlockState` and the block. Execute | ||
the transactions in the block and observe the trace. | ||
|
||
### `compression_procedure` | ||
|
||
The `RequiredBlockState` returned by the `eth_getRequiredBlockState` JSON-RPC method is | ||
compressed. Snappy compression is used ([https://github.com/google/snappy](https://github.com/google/snappy)). | ||
|
||
The encoding and decoding procedures are the same as that used in the Ethereum consensus specifications | ||
([https://github.com/ethereum/consensus-specs/blob/dev/specs/phase0/p2p-interface.md#ssz-snappy-encoding-strategy](https://github.com/ethereum/consensus-specs/blob/dev/specs/phase0/p2p-interface.md#ssz-snappy-encoding-strategy)). | ||
|
||
For encoding (compression), data is first SSZ-encoded and then snappy-encoded. | ||
For decoding (decompression), data is first snappy-decoded and then SSZ-decoded. | ||
|
||
## Security | ||
|
||
### Future protocol changes | ||
|
||
Merkle patricia proofs may be replaced by verkle proofs after some hard fork. | ||
This would not invalidate `RequiredBlockState` data prior to that fork. | ||
The new proof format could be added to this specification for data after that fork. | ||
|
||
### Canonicality | ||
|
||
A recipient of `RequiredBlockState` must check that the blockhashes are part of the real | ||
Ethereum chain history. Failure to verify (`verify_required_block_state`) can result in invalid | ||
re-execution (`trace_block_locally`). | ||
|
||
### Post-block state root | ||
|
||
A user that has access to canonical block hashes and a sound EVM implementation has strong | ||
guarantees about the integrity of the block re-execution (`trace_block_locally`). | ||
|
||
However, there is no guarantee to be able to compute a new block state root for this post-execution | ||
state. For example, with the aim to check against the state root in the block header of that block | ||
and thereby audit the state changes that were applied. | ||
|
||
This is because the state changes may involve an arbitrary number of state deletions. State | ||
deletions may change the structure of the merkle trie in a way that requires knowledge of | ||
internal nodes that are not present in the proofs obtained by `eth_getProof` JSON-RPC method. | ||
Hence, while the complete post-block trie can sometimes be created, it is not guaranteed. | ||
Comment on lines
+326
to
+333
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yup, I was worried about this. Being able to consistently verify the post-state hash doesn't really feel optional to me. I can't trust the result of a local block execution that doesn't also prove that it got the same result as the network did. In the context of the Portal Network, nodes must validate data and only gossip data that we can prove locally. EVM execution engines should be able to handle the case of missing trie nodes at the refactor, and returning the information needed to collect the data. They must handle the case, if they want to run against a partial trie database, like if they're running Beam Sync. I am only familiar enough with py-evm to give the example: The As a prototype, I suppose you could literally run py-evm with the unverified proofs, then retrieve the missing trie nodes over devp2p one by one (there usually aren't too many, from what I've seen). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
For the missing trie node(s), I don't see a clear mechanism to obtain that node. For removed nodes that require knowledge of a sibling node, how could that sibling node be obtained? As it is mid-traversal, the terminal keys of the affected sibling are not trivially known. So There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This is preserved (gossiped data is validated). The gossiped data is the merkle proofs of the state, so this is validated, and the block is also validated. So the source of error here is a bug in the EVM implementation. flowchart TD
Block[Block, secured by header: Program to run] -- gossiped --> Pre
State[State, secured by merkle proofs: Input to program] -- gossiped --> Pre
Pre[Program with inputs] -- Load into EVM environment --> EVM[EVM executes]
EVM -- bug here --> Post[Post block state]
Post -.-> PostRoot[Post-block state root useful for these bugs]
EVM -- bug here --> Trace[debug_traceBlock output]
I agree EVM bugs are possible and having the post-state root would be nice. However:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Ive proposed an idea here and believe our proposal on ZK proofs of the last state can indeed can help address the challenge mentioned. ZK proofs, specifically Zero-Knowledge Succinct Non-Interactive Argument of Knowledge (ZK-SNARKs), have the potential to provide proofs of complex computations and statements: sogolmalek/EIP-x#6 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Let me see if I understand your proposal. The data that a peer receives could consist of the:
The user re-executes the block and arrives at post-block state values. Those are compared to the values in the ZK proof. If they are the same, the chance that the EVM and the ZK-EVM both having a bug is unlikely and the state transition is likely sound. So the ZK proof is equivalent to replaying the block using a different EVM implementation. That is, the same as getting the post-block state from a Rust implementation and comparing to the state produced by a Go implementation. In this respect, the presence of the ZK proof does not seem to introduce additional safety. Perhaps I overlooked a component? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for your detailed arguments. I'm just trying to make sure i understand points well and hope i didnt go far wrong with my following arguments. love to learn more here. Our approach centers around the principle of minimizing data reliance while ensuring the accuracy and reliability of state transitions.While your understanding of our proposal is mostly accurate, there are a few points that could be considered when evaluating the introduced ZK proofs: As you've mentioned, the problem with key deletions is that sometimes sibling nodes in the trie are required to reconstruct the trie structure for verification purposes. These sibling nodes might not be available through the eth_getProof data, leading to incomplete trie reconstructions. I think, a ZK proof of the last state involves generating a cryptographic proof that certifies the correctness of the entire state transition process, including deletions and modifications, can help. This proof can be designed to include information about sibling nodes that are required for verification. In essence, the ZK proof encapsulates the entire state transition, and by design, it must account for all the necessary data, including sibling nodes, to be valid. Also By obtaining and validating a ZK proof of the last state, we can ensure that all required data for verifying the state transition, including missing sibling nodes, is included in the proof. This provides a complete and comprehensive validation mechanism that mitigates the challenge posed by missing nodes. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Some additional context is that With a ZK EVM, one can demonstrate to a peer that the block post-state is valid with respect to a block. This means that they do not have to re-execute the EVM themselves. This is beneficial for a light client that wants to trustlessly keep up to date with the latest Ethereum state. That is, the goal is to not run the EVM (to save resources). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think the idea is a good direction, but after discussing with @gballet I have the feeling sending the exclusion proofs themselves are not enough for the user to perform deletion locally. Because the sibling of the node being deleted could be a subtrie. So the server should use this approach (or another) to give the full prestate that is required for execution of the block, this means figuring out which extra nodes are needed to be returned for such a deletion.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. At present this spec contains sufficient data to execute the block, but not update the proofs to get the post-block root. For context, the main issue is that sometimes the trie depth is changed, which impacts cousin-nodes. More on this here: https://github.com/perama-v/archors/blob/main/crates/multiproof/README.md To enable that, I have been prototyping a mechanism that allows this using only available JSON-RPC methods (eth_getProof) without hacking an execution client. This method calls get_proof on the subsequent block gets these edge-case nodes and tacks them into RequiredBlockState as "oracle data". The data allows the trie update to complete and the post block root compute, verifying that the oracle data was in fact correct. I learned that @s1na has taken a different approach, which is to modify Geth to record all trie nodes touched during a block execution (including post-root computation). That method is sufficient to get trie nodes required to compute the post-block root (but requires modifying the execution client). It is a nice approach. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Essentially the oracle approach says: Look, some changes to the surrounding trie will happen, and they may be complex/extensive. One can "look in to the future" at this exact point in the traversal and just get the nodes for the affected part (leafy-end) of the traversal. The details of the surrounding trie don't need to be computed and aren't important because we can check that what we got from the future was in fact correct. Bit of a hacky approach. I can see that this might be difficult to implement if not using eth_getProof. |
||
|
||
|
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
>> {"jsonrpc":"2.0","id":1,"method":"eth_getRequiredBlockState","params":["0x5f5e100"]} | ||
<< {"jsonrpc":"2.0","id":1,"result":null} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be made verifiable by including the whole header here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you expand on what you mean here?
The assumption here is that a user has a mechanism to verify the canonicality of blockhashes.
More notes here: https://github.com/perama-v/archors#blockhash-opcode
One design goal is to keep RequiredBlockState as small as possible. This is because a static collection of all RequiredBlockState's would allow for a distributed archive node. The current estimate puts the size at 30TB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So my point addresses the following fault case you describe in your notes:
If we want to solve it at the RPC level: instead of sending only block number and block hash, send the whole block header. Then the verifier can hash the header (which contains the number) and be sure about the hash.
I understand the size concern. However if the clients have to audit an external source anyway it might be worth it. Otherwise if in most use-cases the client has easy access to latest 256 blocks then we can leave out this info from
eth_getRequiredBlockState
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A "wrong" hash refers to a hash that is not canonical. So, the wrong hash could be a correctly computed hash of a noncanonical block. E.g. a fabricated block or an uncle block. Having the whole block included doesn't get one closer to verifying canonicality.