-
Notifications
You must be signed in to change notification settings - Fork 44
feat: Kona Rollup Node Architecture Doc #264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 5 commits
614b4f7
e7d22df
c7990b8
bf38433
f20fb86
e1e1a55
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,291 @@ | ||||||
# Kona Rollup Node Architecture | ||||||
|
||||||
# Summary | ||||||
|
||||||
The Kona Rollup Node (aka `kona-node`) is a Rust implementation | ||||||
of the OP Stack rollup node that functions as a Consensus Layer | ||||||
client for OP Stack chains. | ||||||
|
||||||
It builds upon the architectural principles of the go-based `op-node` | ||||||
while introducing improvements in performance, memory safety, | ||||||
and concurrency through Rust's language features. The `kona-node` will | ||||||
be responsible for the same core functions as `op-node`: building, | ||||||
relaying, and verifying the canonical chain of blocks, working in | ||||||
conjunction with an execution layer client like `op-geth` and `op-reth`. | ||||||
|
||||||
This design leverages existing work in the Rust ecosystem, particularly | ||||||
the Kona monorepo for OP Stack types, components, and services, while | ||||||
introducing a modular architecture that allows for component-level | ||||||
customization and optimization. The implementation will prioritize | ||||||
compatibility with the OP Stack protocol while exploring Rust-specific | ||||||
optimizations for improved performance and resource utilization. | ||||||
|
||||||
# Problem Statement + Context | ||||||
|
||||||
The current OP Stack ecosystem relies on a single implementation of the | ||||||
rollup node (`op-node`) written in Go. While functional, this creates | ||||||
several challenges: | ||||||
|
||||||
1. **Single Implementation Risk**: Relying on a single implementation | ||||||
introduces systemic risk if critical bugs or vulnerabilities are discovered. | ||||||
Testing the `op-node` is limited to unit tests, action tests, and integration | ||||||
tests. The individuals implementing features are usually the same ones writing | ||||||
tests. | ||||||
|
||||||
2. **Proof Coverage**: "Stage 1.4" or the `kona-proof` implementation uses its own | ||||||
derivation pipeline implementation. Unlike the `op-node` derivation pipeline, | ||||||
Kona's derivation pipeline only has test coverage through the Optimism monorepo's | ||||||
**proof** "action tests". | ||||||
|
||||||
3. **Performance Limitations**: go-ethereum (aka "geth") was built as a singular | ||||||
piece of client software. It wasn't built to be extensible for rollups. On the other | ||||||
hand, reth was built as an sdk with modularity as a first-class citizen. As such, | ||||||
`op-geth` and the `op-node` are limited in performance by design. | ||||||
|
||||||
4. **Language-Specific Constraints**: Go's garbage collection and concurrency | ||||||
model may introduce performance bottlenecks in specific high-throughput scenarios. | ||||||
Rust's ownership model and lack of garbage collection can provide more predictable | ||||||
performance characteristics, especially under load. | ||||||
|
||||||
5. **Ecosystem Growth**: A growing number of Rust crates implement | ||||||
OP Stack specifications. Some of these are used in `op-reth` as well as "op-revm". | ||||||
There's an increasing security risk to rely on these implementations without having | ||||||
increased coverage through an alternate rust rollup node implementation. | ||||||
|
||||||
The OP Stack protocol itself has been designed with client diversity in mind. | ||||||
Clear specifications allow for multiple implementations in different languages. | ||||||
The Rollup Node specification provides a comprehensive guide for implementing a | ||||||
compatible client, making a Rust implementation both feasible and valuable. | ||||||
|
||||||
# Proposed Solution | ||||||
|
||||||
The `kona-node` will be a modular Rust implementation of the rollup node that | ||||||
adheres to the OP Stack protocol specifications while leveraging Rust's strengths | ||||||
in performance, safety, and concurrency. The implementation will: | ||||||
|
||||||
- **Maintain Protocol Compatibility** | ||||||
- **Leverage Rust Ecosystem Crates** | ||||||
- **Optimize for Performance** | ||||||
- **Enhance Modularity** | ||||||
- **Improve Testability** | ||||||
|
||||||
## Architecture | ||||||
|
||||||
### Building Blocks from First Principles | ||||||
|
||||||
The Rollup Node is a core component of the OP Stack. | ||||||
It is responsible for constructing the canonical safe L2 chain. | ||||||
In order to produce the chain, the Rollup Node listens to two sources | ||||||
of information. | ||||||
|
||||||
1. **Data Availability Layer**: As new L1 blocks are produced, the receipts | ||||||
are parsed for new L2 chain updates. L2 inputs (L2 transaction batches + deposits) | ||||||
are then derived from the data availability (DA) layer. | ||||||
|
||||||
2. **L2 Sequencer**: The L2 sequencer produces unsafe L2 blocks and sends them over | ||||||
p2p gossip to other rollup nodes. | ||||||
|
||||||
What this looks like currently is two sources feeding into derivation, | ||||||
somehow producing the L2 Chain. | ||||||
|
||||||
``` | ||||||
┌────────────┐ | ||||||
│L2 Sequencer│ | ||||||
│ ├────────────────────┐ | ||||||
│ Gossip │ │ ┌────────────┐ | ||||||
└────────────┘ │ │ │ | ||||||
├──► │ ??? │──► < L2 Chain > | ||||||
┌────────────┐ ┌────────────┐ │ │ │ | ||||||
│ DA │ │ │ │ └────────────┘ | ||||||
│ ├──► │ Derivation ├──┘ | ||||||
│ Watcher │ │ │ | ||||||
└────────────┘ └────────────┘ | ||||||
``` | ||||||
|
||||||
From these sources, the rollup node imports "unsafe" blocks from the L2 sequencer | ||||||
and safe blocks from the L2 derivation pipeline. Both unsafe and safe blocks | ||||||
are imported into the L2 execution layer via the Engine API. | ||||||
|
||||||
``` | ||||||
┌────────────┐ | ||||||
│L2 Sequencer│ | ||||||
│ ├────────────────────┐ | ||||||
│ Gossip │ │ ┌────────────┐ | ||||||
└────────────┘ │ │ │ | ||||||
├──► │ Engine API │──► < L2 Chain > | ||||||
┌────────────┐ ┌────────────┐ │ │ │ | ||||||
│ DA │ │ │ │ └────────────┘ | ||||||
│ ├──► │ Derivation ├──┘ | ||||||
│ Watcher │ │ │ | ||||||
└────────────┘ └────────────┘ | ||||||
``` | ||||||
|
||||||
The [Holocene Hardfork][holocene] introduced steady block derivation. | ||||||
This change allows a payload to be replaced with a deposits-only block | ||||||
to allow the L2 chain to progress even if an invalid payload is derived. | ||||||
|
||||||
Steady block derivation has significant architectural impacts for the | ||||||
rollup node. With Holocene, the derivation component needs to be updated | ||||||
if a payload is replaced with a deposits-only block. This requires the | ||||||
Engine API component to send a message to the derivation component, | ||||||
introducing an edge in the modular component graph. | ||||||
|
||||||
[holocene]: https://specs.optimism.io/protocol/holocene/derivation.html | ||||||
|
||||||
``` | ||||||
┌────────────┐ | ||||||
│L2 Sequencer│ | ||||||
│ ├───┐ | ||||||
│ Gossip │ │ ┌────────────┐ ┌────────────┐ | ||||||
└────────────┘ │ │ │ │ │ | ||||||
├──►│ Derivation │──►│ Engine API │──► < L2 Chain > | ||||||
┌────────────┐ │ │ │ │ │ | ||||||
│ DA │ │ └────────────┘ └──────┬─────┘ | ||||||
│ ├───┘ ▲ │ | ||||||
│ Watcher │ └────────────────┘ | ||||||
└────────────┘ | ||||||
``` | ||||||
|
||||||
### A Modular Architecture | ||||||
|
||||||
The components described above are all critical to the rollup node. | ||||||
Whether these components are part of the architecture is not up for discussion. | ||||||
What this document addresses is how the components communicate. Namely, do | ||||||
components share memory or use an actor-based architecture with message-passing. | ||||||
The remainder of this section will provide reasoning to choose the latter. | ||||||
|
||||||
In an actor-based system, each component is constructed as an "Actor". There's | ||||||
a Derivation Actor, DA Watcher Actor, P2P Actor (L2 sequencer gossip), | ||||||
and Engine Actor. | ||||||
|
||||||
Instead of having some top-level object "own" components, actors are spawned | ||||||
as threads, and communication between actors happens through channels with | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. are these channels synchronous? If not, what's the queueing policy and how do events/ message receipts handled? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good question. The answer is it really depends on the actors and the relationships between those actors. Since the channels don't have to be uniform, I haven't specified anything here. Most likely, it'll change as we go along. |
||||||
messages. In this architecture, a top-level layer "orchestrates" the various | ||||||
actor to handle wiring up communication between actors. The primary benefit | ||||||
of using an actor-based system is tasks can actually be parallelized. As opposed | ||||||
to kona's derivation pipeline which uses an ownership model where each stage | ||||||
in the derivation pipeline owns the previous stage. Because of this ownership, | ||||||
derivation pipeline work is single-threaded and blocking. Pipeline stages cannot | ||||||
make progress in parallel. | ||||||
|
||||||
If the orchestration layer is called "Rollup Node", a full modular architecture | ||||||
for the Kona Rollup Node looks like the following. | ||||||
|
||||||
``` | ||||||
┌────────────────────────────────────────────────────────────────────┐ | ||||||
│ │ | ||||||
┌───┤ Rollup Node Service │ | ||||||
│ │ │ | ||||||
│ └──────────────────────────┬────────────────┬─────────────────┬──────┘ | ||||||
│ │ │ │ | ||||||
│ ┌────────────┐ │ │ │ | ||||||
│ │L2 Sequencer│ │ │ │ | ||||||
├─► │ ├───┐ ▼ ▼ ▼ | ||||||
│ │ Gossip │ │ ┌────────────┐ ┌────────────┐ ┌───────────┐ | ||||||
│ └────────────┘ │ │ │ │ │ │ │ | ||||||
│ ├──►│ Derivation │──►│ Engine API │ │ Rpc Actor │ | ||||||
│ ┌────────────┐ │ │ │ │ │ │ │ | ||||||
│ │ DA │ │ └────────────┘ └──────┬─────┘ └───────────┘ | ||||||
└─► │ ├───┘ ▲ │ | ||||||
│ Watcher │ └────────────────┘ | ||||||
└────────────┘ | ||||||
``` | ||||||
|
||||||
Notice, the "Rpc Actor" doesn't have communication lines drawn to other actors. | ||||||
This is because the JSON RPC server has rpc modules that it registers which | ||||||
handle various rpc requests. In effect, each actor registers its rpc module | ||||||
with the Rpc Actor at construction. The rpc modules in turn grab or perform | ||||||
actions on other actors via message passing. Effectively, the Rpc Actor | ||||||
messages with all other actors. | ||||||
|
||||||
### Sequencer vs Validator Mode | ||||||
|
||||||
Up to this point, Kona's rollup node architecture has been considered | ||||||
from the perspective of a _validator_ node. That is, a rollup node | ||||||
that only derives the l2 chain, but doesn't _build_ (aka "sequence") | ||||||
the L2 chain. In order to extend support for sequencing the L2 chain, | ||||||
the rollup node needs to accept transactions, build l2 blocks, and | ||||||
broadcast these blocks as gossip over the p2p network. | ||||||
|
||||||
Part of the architecture to consider is being able to plug in | ||||||
"block building" through a simple API. The `op-node` separates | ||||||
the sequencer logic from the rest of the node driver, allowing | ||||||
it to toggle sequencing on and off via a simple CLI flag. Kona | ||||||
can take this one step further. Using a minimal API, the | ||||||
`kona-node` should allow sequencing to be toggled on and off, | ||||||
but also let users easily slot in their own block building and | ||||||
sequencing logic using the given API. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's a really interesting concept. Sounds like this would be pretty interesting for sequencers that would want to have custom MEV algorithms or those that want to do some advanced profit estimation based on transaction inclusion. Are these the use-cases you thought of? Or did you have something different in mind There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. At some point, we should definitely seek to remove the |
||||||
|
||||||
Consider sequencing for the `kona-node`. A new "Sequencer" | ||||||
actor would be introduced, extending the architecture as follows. | ||||||
|
||||||
``` | ||||||
┌────────────────────────────────────────────────────────────────────────────────────┐ | ||||||
│ │ | ||||||
┌───┤ Rollup Node Service │ | ||||||
│ │ │ | ||||||
│ └──────────────────────────┬────────────────┬────────────────┬────────────────┬──────┘ | ||||||
│ │ │ │ │ | ||||||
│ ┌────────────┐ │ │ │ │ | ||||||
│ │ DA │ │ │ │ │ | ||||||
├─► │ ├───┐ ▼ ▼ ▼ ▼ | ||||||
│ │ Watcher │ │ ┌────────────┐ ┌────────────┐ ┌───────────┐ ┌───────────┐ | ||||||
│ └────────────┘ │ │ │ │ │ │ │ │ │ | ||||||
│ ├──►│ Derivation │──►│ Engine API │ │ Sequencer │ │ Rpc Actor │ | ||||||
│ ┌────────────┐ │ │ │ │ │ │ │ │ │ | ||||||
│ │L2 Sequencer│ │ └────────────┘ └──────┬─────┘ └─┬───┬─────┘ └───────────┘ | ||||||
└─► │ ├───┘ ▲ │ ▲ │ │ | ||||||
│ Gossip │ └────────────────┘ └───────┘ │ | ||||||
└────────────┘ │ | ||||||
▲ │ | ||||||
└──────────────────────────────────────────────────────┤ | ||||||
│ | ||||||
▼ | ||||||
┌───────────┐ | ||||||
│ │ | ||||||
│ Conductor │ | ||||||
│ │ | ||||||
└───────────┘ | ||||||
``` | ||||||
|
||||||
Similarly to the `op-node`, the `kona-node` should make use of the | ||||||
Conductor abstraction (the `op-conductor` is the implementation). | ||||||
The conductor API allows the sequencer actor to commit unsafe | ||||||
payloads to the L2 chain. It also acts as an auxiliary service to | ||||||
the Rollup Node, performing leader election and chain reconciliation. | ||||||
Regardless, consensus and `op-conductor` guarantees are periphery | ||||||
to this document. The key idea here is for `kona-node` to use the | ||||||
same `op-conductor` abstraction. | ||||||
|
||||||
# Alternatives Considered | ||||||
|
||||||
Instead of allowing actors to "pass messages" between each other, it was | ||||||
considered to route all messages through the orchestration service. | ||||||
|
||||||
Concretely, this would look like a large-variant enum that holds all | ||||||
message types. Actors would then only send and receive messages from the | ||||||
"Rollup Node Service" orchestrator. This more closely resembles the `op-node` | ||||||
architecture where "derivers" are registered via a "registry". Various | ||||||
event types are emitted and broadcasted to rollup node components. | ||||||
|
||||||
While this works effectively for the `op-node` it introduces significant | ||||||
overhead and risk for Kona's Rollup Node. Since the `kona-node` is | ||||||
parallized, mishandling or even spontaneous flakes where messages are dropped, | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
I think..? |
||||||
can result in an unrecoverable deadlock. By establishing messaging channels | ||||||
directly between actors, there's less "surface area" for message passing | ||||||
to be improperly configured. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think insisting on the parallelization is a great idea. I was wondering if the optimism spec was built to handle internal component parallelism or if this is something we should be careful of in the implementation There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It'll be something we need to be very careful with. The good thing is we have so many methods of stress testing this architecture through kurtosis, local syncing, action tests, etc etc against a matrix of chains. Testing early is how I hope we can find any architecture-related bugs quicker. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Parallelism for |
||||||
|
||||||
# Risks & Uncertainties | ||||||
|
||||||
Actor-based systems aren't a free lunch. Where parallelization is introduced, | ||||||
so is the ability for actors to become stuck in a deadlock. Given the cycle | ||||||
between the Engine Actor and Derivation Actor, this is a real risk. You can | ||||||
imagine the Derivation Actor isn't reset when a holocene deposits-only block | ||||||
is created. Meanwhile, the Engine Actor is waiting for safe blocks from the | ||||||
derivation pipeline. | ||||||
|
||||||
In a similar way, the `op-node`'s event system architecture also has a risk | ||||||
of deadlock. With strongly typed messages and explicit well-tested message | ||||||
handling and emission, this risk is mitigated. One way to extend this testing | ||||||
to the Kona Rollup Node is to support node action tests like proof action | ||||||
tests support running the `kona-proof`. |
Uh oh!
There was an error while loading. Please reload this page.