An Ethereum node consists of two layers:
- An execution client (such as Reth)
- A consensus client (such as Lighthouse)
The execution client deals with transactions (validation, gossip), block execution and state transitions, and storage.
The consensus client deals with blocks (proposal, gossip, attesting to block validity), and tracking state and performance of validators.
TABLE OF CONTENTS
Reth's start point is ingress of config data which customizes the execution client.
There are many sections available for customization.
Stages
: Configures syncing of blockchain data, maintaining state, updating the database etc.Peers
: Configures management of network connections such as limiting the number of peers, time between attempting reconnection and peer reputation scoring.Sessions
: Configures individual network sessions between peers and handles request timeouts and buffer sizes per peer.Prunning
: Configures data storage and enables specific segments to be pruned independently of the others. This includes receipts, storage and account histories, when to prune etc.
Upon initialization Reth checks for the existence of its database, and if it's not found, a new database of predefined tables
is created.
The database in Reth consists of two layers:
- Database Layer:
MDBX
- Abstract Layer: Sits on top of the database
MDBX
is an extremely fast, compact, and transactional key-value database that guarantees data integrity and performance through its adherence to ACID properties.
The abstract layer standardizes database interactions by defining an interface to interact with the database and a schema for storing data. The schema can be thought of as a collection of tables
with keys
and values
, where both keys and values may be complex data structures that are encoded and decoded.
There are many tables
because each table
focuses on one type of data, such as transactions
, headers
, receipts
etc.
This design, coupled with the properties of MDBX
, provides Reth with flexibility and high performance when handling data.
Reth utilizes DevP2P
to create its routing table of peers and manage connections. The process consists of node discovery followed by establishing individual peer connections.
The node discovery process consists of locating and maintaining active nodes within its routing table.
The protocol used is Discv4
, built on UDP
, which utilizes a distributed hash table (DHT
), based on the Kademlia algorithm, to maintain peers based on their "distance" to the node. The distance is not geographical but instead derived between IDs and stored/grouped within "buckets".
Each node has its own DHT
with a set number of buckets rather than each node containing a global list of all nodes.
- Reth contains a list of known bootstrap nodes used to begin node discovery
- Reth sends a
PING
message to a node and waits for aPONG
message response - The
PONG
response is verified to come from thePING
recipient - Post verification Reth sends a
FIND_NODE
message with an ID to populate its routing table of peers - The Kademlia algorithm is used to find the closest peers and a list of peers is sent back
- Reth will update its routing table and repeat step 2 via the new peers until the table is populated
- Periodically, Reth will
PING
its peers to prune/update its routing table
Discv4
is currently in use until all execution clients support Discv5
. Discv5
may be enabled through the use of a flag and it will run simultaneously along Discv4
.
After peers have added each other to their routing tables a connection may be established.
- Reth will initiate a
TCP
connection to its peer RLPx
is used to create a secure session for communication- A
HELLO
message is sent to negotiate which sub-protocols (and versions) may be used for communication - Peers select common protocols with their highest compatible versions
- Each sub-protocol establishes a connection in its own way
Eth
: Status message (network id, blockhash, version...)LES
: Status message (network id, blockhash, version...)Whisper
: Handshake (version, capabilities...)Swarm
: Handshake (version, capabilities...)
The sub-protocols are
Eth
: Ethereum protocol used for tx/block propagation, blockchain and state synchronizationLight Ethereum Sub-protocol (LES)
: Used by light clients to verify stateWhisper
: P2P encrypted messaging used for privacy and anonymitySwarm
: Distributed storage / content sharingSnap
: TODOWitness (WIT)
: TODO
NOTE: Reth may not support all of these and some of these may be deprecated.
Synchronization in Reth can be split into two sections
When Reth comes online it must catch up to the head of the blockchain. It accomplishes this by requesting data from randomly selected peers.
The synchronization process utilizes a pipeline
mechanism, which processes stages
sequentially. If a stage
is executed successfully, the subsequent stage
is executed. Otherwise, any changes are unwound.
The stages
include, but are not limited to:
Headers
: To avoid a "long-range attack", headers are requested in batches from the tip of the chain in descending order to the latest block in its database. Each header is validated and then stored.Body
: Using the downloaded headers, Reth determines which block bodies to download. It pre-validate them by checking the ommers hash and transaction root against the block body, followed by adding each transaction from the block to its database.SenderRecovery
: The transaction signer is recovered and stored for each transaction.Execution
: The transactions from each block are executed, and state changes (such as updates to balances, bytecode, etc.) are applied.
After completing the initial synchronization with the blockchain, Reth performs two tasks to stay updated with the latest state of the network:
- Reth exposes a
RPC
for users and peers to submit transactions - When a new transaction is received Reth performs basic validation such as checking the nonce, gas limit, signature verification etc.
- If the transaction passes validation, it is added to its mempool
- The validated transaction is then broadcast to its peers
- Reth also provides another
RPC
for consensus clients to retrieve neccessary data for block creation - When a consensus client requests transactions for a new block, Reth selects transactions from its mempool based on a specific ruleset, often prioritizing those with the highest gas price
- Along with the selected transactions, Reth supplies additional data required for block creation, including the block header, state root, transaction root, and receipt root
There are two scenarios in which Reth receives a block from the consensus client:
- Reth's block has reached consensus
- The network has produced a new block
In both cases the consensus client sends the block back to Reth for execution and storage.
- Reth performs block validation to adhere to protocol rules
- Reth prepares state for transaction execution in Revm
- Revm executes transactions sequentially while applying state changes
- Revm returns the outcome of its execution to Reth
- Reth finalizes the state changes and updates its database
Paradigm provides a definition of ExExs in their blog as:
Post-execution hooks for building real-time, high performance and zero-operations off-chain infrastructure on top of Reth.
As Reth's state changes, notifications are emitted which the ExEx may use to derive its state and functionality.
Notification examples include:
- Chain operations: Notifications about whether a commit, revert or reorg has occured
- Blocks: Information about their transactions, receipts and state changes
Upon processing an event the ExEx must send an event back to Reth to indicate that it has finished processing the event and it's safe to prune the associated data.
The benefits of using ExExs include:
- Immediate processing of blockchain data
- Scaling infrastructure without altering the core functionality of Reth