Skip to content

feat: Kona Rollup Node Architecture Doc #264

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Conversation

refcell
Copy link
Contributor

@refcell refcell commented Apr 14, 2025

Description

Writes up an architecture document for the kona-node (aka the Kona Rollup Node).
Considers how this architecture compares to the op-node.
Uses diagrams to visualize how the various actors tie together.

@refcell refcell added the documentation Improvements or additions to documentation label Apr 14, 2025
@refcell refcell self-assigned this Apr 14, 2025
@refcell refcell requested review from clabby, Copilot and theochap April 14, 2025 18:09
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (2)

protocol/kona-node-arch.md:97

  • [nitpick] The placeholder '???' in the diagram is ambiguous; please replace it with a descriptive label or remove it to avoid confusion.
                                  ├──► │     ???    │──►  < L2 Chain >

protocol/kona-node-arch.md:253

  • The word 'parallized' appears to be a misspelling; consider using 'parallelized' instead.
Since the `kona-node` is parallized, mishandling or even spontaneous flakes where messages are dropped, can result in an unrecoverable deadlock.

@refcell refcell marked this pull request as ready for review April 16, 2025 12:18
Copy link
Contributor

@Inphi Inphi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is interop being considered for this architecture? It would be nice if we can easily attach a component that communicates with op-supervisor.

and Engine Actor.

Instead of having some top-level object "own" components, actors are spawned
as threads, and communication between actors happens through channels with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these channels synchronous? If not, what's the queueing policy and how do events/ message receipts handled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. The answer is it really depends on the actors and the relationships between those actors. Since the channels don't have to be uniform, I haven't specified anything here. Most likely, it'll change as we go along.

@refcell
Copy link
Contributor Author

refcell commented Apr 17, 2025

Is interop being considered for this architecture? It would be nice if we can easily attach a component that communicates with op-supervisor.

Thanks for reviewing Mofi!

My understanding is that the majority of the interop changes occur at the tx pool / block building stage. Since the kona-node will be designed to be modular, block building should be easily plugged into the node. The goal is for this modularity to allow interop and other features / changes to be easily supported going forward.

Is that a sufficient answer? Are there other parts to interop that we need to consider besides block building?

@clabby
Copy link
Member

clabby commented Apr 17, 2025

Are there other parts to interop that we need to consider besides block building?

@axelKingsley recently gave me a pretty good view of the "managed node" concept they've added with the op-supervisor. With the "managed" backend, the node and supervisor communicate, and the supervisor instructs the node to perform certain actions and vice versa - i.e., the supervisor instructs when to update certain heads, and the node instructs the supervisor to inform database updates etc. This is an implementation detail, but unfortunately an implementation detail we'll have to follow in order to be compatible with op-supervisor.

Reached out to Axel and got the following docs on the subject:

can take this one step further. Using a minimal API, the
`kona-node` should allow sequencing to be toggled on and off,
but also let users easily slot in their own block building and
sequencing logic using the given API.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a really interesting concept. Sounds like this would be pretty interesting for sequencers that would want to have custom MEV algorithms or those that want to do some advanced profit estimation based on transaction inclusion. Are these the use-cases you thought of? Or did you have something different in mind

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point, we should definitely seek to remove the rollup-boost shim in favor of a native layer that allows for custom block building functionality, as an example. Things like direct rbuilder integration would be incredible.

parallized, mishandling or even spontaneous flakes where messages are dropped,
can result in an unrecoverable deadlock. By establishing messaging channels
directly between actors, there's less "surface area" for message passing
to be improperly configured.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think insisting on the parallelization is a great idea. I was wondering if the optimism spec was built to handle internal component parallelism or if this is something we should be careful of in the implementation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll be something we need to be very careful with. The good thing is we have so many methods of stress testing this architecture through kurtosis, local syncing, action tests, etc etc against a matrix of chains. Testing early is how I hope we can find any architecture-related bugs quicker.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parallelism for op-node is in a draft PR here: ethereum-optimism/optimism#11100

@axelKingsley
Copy link
Contributor

Are there other parts to interop that we need to consider besides block building?

@axelKingsley recently gave me a pretty good view of the "managed node" concept they've added with the op-supervisor. With the "managed" backend, the node and supervisor communicate, and the supervisor instructs the node to perform certain actions and vice versa - i.e., the supervisor instructs when to update certain heads, and the node instructs the supervisor to inform database updates etc. This is an implementation detail, but unfortunately an implementation detail we'll have to follow in order to be compatible with op-supervisor.

Reached out to Axel and got the following docs on the subject:

Managed Mode is an important thing to understand for the Supervisor's operation, but I don't know that it will be required to replicate in other clients. Managed Mode's purpose is to provide the Supervisor with a stable collection of Nodes from which to index log events. The Golang Node implementation already satisfies this requirement, and while it would be cool for the Supervisor to support other clients, the only benefit we would get is heterogeneous indexing sources.

There is another mode which we have named Standard Mode (questionable name at this point) which I think would be a better integration between the Supervisor and an Alternative Client. In Standard Mode, Kona would do its local derivation and L1 consolidation, and then would do a further consolidation with a Supervisor over an API call. This is ill-defined, but you could imagine Kona supplies the last L1 block it derived from, and the Supervisor responds with the appropriate L2 cross-unsafe, cross-safe, and any required block replacements.

Standard Mode was first imagined to support chain operators who didn't want to run a full Supervisor, but would also work as a basic interface into interop for alt clients. You could then separately focus on the longer term goal of including Interop Validation into Kona, to remove reliance on golang implemenations altogether.

The Golang implemenation does not yet support Standard Mode. It is just outside our priority scope, either @teddyknox or I will be working on it. However, we could do with some outside spec input, so if Kona wants to be the first Standard Mode implementor that'd be nifty.

@clabby
Copy link
Member

clabby commented Apr 17, 2025

I'm a fan of @axelKingsley's proposal to not support the op-supervisor's "managed mode" at first (or ever.) From what I can tell, managed mode in the op-node + op-supervisor is a tight coupling between the two implementations that we likely do not want to subscribe to arbitrarily.

This gives us the flexibility to wait until we have our own supervisor implementation for a tighter integration with a "managed mode"-esq setup. While an important feature to support, we can run interop without it, and will likely land on a better result with a supervisor that's designed around the kona-node rather than the op-node specifically.

This sort-of surfaces the question - does the op-supervisor team intend for the supervisor to be like an extension of a rollup node implementation, or like an independent service that we should be able to plug-and-play? For the EL, we've standardized the endpoint for txpool pre-validation, but on the CL, the integration looks much tighter.

@axelKingsley
Copy link
Contributor

This sort-of surfaces the question - does the op-supervisor team intend for the supervisor to be like an extension of a rollup node implementation, or like an independent service that we should be able to plug-and-play? For the EL, we've standardized the endpoint for txpool pre-validation, but on the CL, the integration looks much tighter.

The Supervisor's identity has changed over time, and I don't know that we currently maintain a clear answer to this question. But I will stick my neck out to say: The Supervisor is an Extension of the Rollup Node Implementation... and as a second-order effect, it can serve that information out for plug-and-play support on Alt Clients.

The reality of Superchain Validation is that it really does require a meta-derivation, which the Supervisor provides. Once the parallel work of local derivation is done, cross validation determines replacement blocks, which is part of Derivation. IF a client is responsible for implementing derivation (and it is), then it is responsible for implementing cross-chain validation.

The second-order effect of being plug-and-play is, I think, opportunistic. We designed the Supervisor with RPC connections because the op-node is already a mature piece of software and can do derivation work. The API interface to check messages was a natural need for supporting filtering in the execution layer, and is naturally callable by anyone.

@tynes
Copy link
Contributor

tynes commented Apr 18, 2025

Kona Node Arch Design Review - April 18

Review and discuss the initial design proposal for a modular rollup node architecture in Kona.

Key Takeaways

Topics

Proposed Modular Architecture

Parallelization and Synchronization

Interoperability Considerations

Supervisor Integration

Next Steps

@refcell
Copy link
Contributor Author

refcell commented Apr 18, 2025

here looks to be the core supervisor document that considers the role that the supervisor plays in the OP Stack. Specifically, how the supervisor works with the op-node. There may be other documents to reference, but a few of them seem to be in notion.


While this works effectively for the `op-node` it introduces significant
overhead and risk for Kona's Rollup Node. Since the `kona-node` is
parallized, mishandling or even spontaneous flakes where messages are dropped,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
parallized, mishandling or even spontaneous flakes where messages are dropped,
parallelized, mishandling or even spontaneous flakes where messages are dropped,

I think..?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants