Add Anchor Processor #57

dknopik · 2024-11-27T13:02:01Z

Issue Addressed

Implement Anchor Processor #41

Proposed Changes

Add a "processor" component, inspired by the beacon processor from lighthouse. It differs from Lighthouses' processor:

Multiple input channels instead of a single input channel which is split into queues.
No reprocessing queues. Could be added of course, but I think we should avoid those if possible to design around synchronisation primitives such as Notify etc instead.
No "overflowing queues", which drop existing items instead of logging an error. Can be implemented.

Additional Info

I've abandoned efforts to create a common base processor module used by both Lighthouse and Anchor, as this would basically replace glue with other glue. Also, Lighthouse has a lot of specialised behaviour that would be tough to abstract in a common module.

This is only a preliminary version we can use for development - we will of course need to reconsider and/or fine-tune later when we can actually test performance.

This is a draft, stuff like metrics is still missing.

Zacholme7 · 2024-11-27T15:27:06Z

My very initial thoughts

I like the distinction between async and blocking tasks, but im not sure on the WorkItem actually containing the function to be executed. In my mind, a WorkItem is just a signal of something that has to be done. It is just message + state. It is then up to the processor to handle the correct routing of that item to the thing that will fulfill it. Something like registering a handler for each item. Also, we would have to Box::pin all the async functions which might get a little icky and theres some overhead with dynamic dispatch.
Following up on 1, maybe we should consider if we want the processor to act as more of an central executor or a manager. As an central executor, the processor is receiving messages from the channels and then executing them. Maybe it makes sense for the processor to act as more of a manager with a set of workers it coordiantes/checks on, while the workers are actually receiving the messages and doing the heavy lifting. Just something I thought of on the fly and may have no relevance/be practical at all.

Other than those two high level thoughts I like it. The priority via biased is nice. It is also difficult to design this without us knowing the full workload right now. Interested to see what others think too.

dknopik · 2024-11-27T15:49:38Z

Thank you for looking through the PR :)

I like the distinction between async and blocking tasks, but im not sure on the WorkItem actually containing the function to be executed. In my mind, a WorkItem is just a signal of something that has to be done. It is just message + state. It is then up to the processor to handle the correct routing of that item to the thing that will fulfill it. Something like registering a handler for each item. Also, we would have to Box::pin all the async functions which might get a little icky and theres some overhead with dynamic dispatch.

This is one of the aspects of the beacon processor it kept in. I like it because it saves us infrastructure (having to register a handler for each queue) and allows us to easily submit work items that are different but belong to the same priority category (=queue).

Following up on 1, maybe we should consider if we want the processor to act as more of an central executor or a manager. As an central executor, the processor is receiving messages from the channels and then executing them. Maybe it makes sense for the processor to act as more of a manager with a set of workers it coordiantes/checks on, while the workers are actually receiving the messages and doing the heavy lifting. Just something I thought of on the fly and may have no relevance/be practical at all.

What do you mean by worker? Basically a thread that continually runs and receives messages from the manager, servicing a single queue?

Zacholme7 · 2024-11-27T16:20:14Z

This is one of the aspects of the beacon processor it kept in. I like it because it saves us infrastructure (having to register a handler for each queue) and allows us to easily submit work items that are different but belong to the same priority category (=queue).

Ah okay cool that makes sense. I have not dug too much into the beacon processor.

What do you mean by worker? Basically a thread that continually runs and receives messages from the manager, servicing a single queue?

Say we have a slot where multiple validators each have a duty to fulfill. The duty manager would send a message to the specialized workers which in turn would do any specific processing needed and spin up their own QBFT instances and handle that entire life cycle. The processor here is then used to make sure everything is progressing as intended and the workers are still alive, etc.

I suppose here we also have to make the distinction between having workers based on queue priority like is currently done or having workers that specialize in some duty such as attestations. But we would have to somehow work in priority for the latter.

Might make sense to just not over engineer and route it all through the processor and have that manage all the QBFT instances.

Just food for thought and thinking out loud.

eserilev · 2024-11-27T16:57:50Z

re: a common base processor module:

I've done some work in this PR to generalize aspects of the Lighthouse beacon processor. Mostly to experiment with a different scheduling algorithm. Its still very much a WIP and I'm not sure if these changes would be useful for Anchor, but mentioning it here just in case

dknopik · 2024-11-28T13:00:38Z

@eserilev thanks! Being able to switch scheduling strategies is definitely valuable. However, your PR does not yet generalize over the possible work items. It motivated me again to think about a good way to do that though, would be nice to come up with something that supports both :)

@Zacholme7 ah, got you :) yeah, until now I was thinking about the processor purely as a scheduler for task admission. You are right, having it manage e.g. QBFT instances and "route" QBFT messages to the correct instances might be worth it, so I will explore that as well. Thanks for your input!

jking-aus · 2024-11-29T01:21:27Z

just reviewed -- good start mate and I think it's on the right track. Fair bit of work to come in refining queues, scheduling and prioritisation imo but keen to hear where your head is at.

On some of the discussion above:

Yes the intent is to have the processor instantiate and manage the QBFT instances based on inputs from the duties service, which is what I think Zach's comment is saying.
It should queue tasks/events from the duties service as work items as well as messages between qbft instances, the networking layer, and the crypto module.
The scheduling stuff in the beacon processor and Eitan's PR is worth looking at. We need some prioritisation logic eventually.

AgeManning

Looks good!

I read some of the comments. In terms of routing messages to QBFT instances, we had envisaged each instance being tagged with a unique InstanceId.

The Processor (or some state attached to it) would build a mapping of <InstanceId, Sender> and <InstanceId, Receiver> when creating them.
So we can always just send/recv messages from each instance.

I dont think we need to worry to much about checking if they are running or not. We can have metrics to see how many we have running at any given time, but in general I think its fine to assume they will always complete, either success or failure in at most X time. There is no scenario where a QBFT task cannot end after X time (this saves us checking on them, we can just spawn and forget).

anchor/processor/src/lib.rs

add metrics to simple example

dknopik · 2024-12-03T13:28:06Z

I think we should go with the simple variant for the processor instead of sharing code with Lighthouse for now - Lighthouse's needs are pretty complex, and Anchor's needs not fully known yet, so for time budget reasons it is probably better to start with the non-modularized version and later revisit an implementation using the modular processor @eserilev is cooking up. :)

jking-aus

good work mate, concise and logical.

I think for todos:

We should use a more structured type definition for the queues: i.e.
pub Struct QueueConfig {name: str, capacity: usize, permit: bool} then a vec of queues or something to define the queue type. Would make a few things like monitoring, runtime config and queue creation easier.
Maybe graceful shutdown handling (nice to have)
Consider resource limits and performance monitoring (nice to have)

jking-aus · 2024-12-06T03:32:57Z

anchor/processor/src/lib.rs

+    /// The [`DropOnFinish`] should be dropped when the work is done, for proper permit accounting
+    /// and metrics. This includes any work triggered by the closure, so [`DropOnFinish`] should
+    /// be sent along if any other process such as a QBFT instance is messaged.
+    pub fn new_immediate(name: &'static str, func: ImmediateFn) -> Self {


this is a good addition for state change and queue management

dknopik · 2024-12-06T07:09:39Z

Thanks for the review!

* We should use a more structured type definition for the queues: i.e.
  `pub Struct QueueConfig {name: str, capacity: usize, permit: bool}` then a vec of queues or something to define the queue type. Would make a few things like monitoring, runtime config and queue creation easier.

Yeah, that might make sense. I am not sure about the configurability of needing a permit though - This is currently very ingrained into the way I implemented the main select!. Also, it doesn't really add value to have multiple permitless input queues, as the semantic meaning of that queue is basically "throw into tokio asap and let it deal with it". The tasks in there should therefore be very quick or very non-blocking (basically awaiting most of the time) - and if we have to prioritize which task we add to tokio first (just add, not execute!), that would mean that we produce these tasks faster than we can add them, and then something is seriously wrong anyway :)

* Maybe graceful shutdown handling (nice to have)

Oh yeah - that's a good idea. I wonder if we need an explicit shutdown queue or if dropping the senders is enough as shutdown signal 🤔 probably not due to the risk of deadlocking?

* Consider resource limits and performance monitoring (nice to have)

We have the semaphore as a limit already. What kind of limit do you envision? Also, for monitoring, what do you have in mind in addition to the metrics?

jking-aus · 2024-12-09T02:25:05Z

Thanks for the review!
* We should use a more structured type definition for the queues: i.e.
  `pub Struct QueueConfig {name: str, capacity: usize, permit: bool}` then a vec of queues or something to define the queue type. Would make a few things like monitoring, runtime config and queue creation easier.
Yeah, that might make sense. I am not sure about the configurability of needing a permit though - This is currently very ingrained into the way I implemented the main select!. Also, it doesn't really add value to have multiple permitless input queues...

yep all good - permitless queue could be a separate type and it be fine. Just made sense to me to have it all in one for ease of use, even if it almost always false.

* Maybe graceful shutdown handling (nice to have)
Oh yeah - that's a good idea. I wonder if we need an explicit shutdown queue or if dropping the senders is enough as shutdown signal 🤔 probably not due to the risk of deadlocking?

I could be wrong but I think only deadlock risk is blocked tasks with indefinite awaits. For now dropping senders OK but I think we need to handle the blocking tasks more carefully in next ver. Could we just use Notify?

* Consider resource limits and performance monitoring (nice to have)
We have the semaphore as a limit already. What kind of limit do you envision? Also, for monitoring, what do you have in mind in addition to the metrics?

was thinking more for in the future when we need to optimise throughput or check backpressure -- queue latency, processing time, resource usage, etc. Not for this version but an idea for when we've got mvp and are finetuning.

jking-aus

lgtm

add draft of processor

c77a832

dknopik requested review from AgeManning and jking-aus November 27, 2024 13:02

AgeManning reviewed Dec 2, 2024

View reviewed changes

anchor/processor/src/lib.rs Outdated Show resolved Hide resolved

anchor/processor/src/lib.rs Outdated Show resolved Hide resolved

dknopik added 7 commits December 2, 2024 16:27

add two experiments for fully modular processor

070dd27

add metrics to simple example

cargo fmt

671b9c5

Merge branch 'unstable' into processor

3d5c58f

remove processor experiments

566d0f1

add a permitless queue for async and fast tasks

575f753

add work expiry

35aee42

add processor state (for sending to QBFT instances etc.)

d19c2dd

dknopik added 3 commits December 3, 2024 17:26

docs

7dfc670

fix docs

2e450d5

add test and slightly improve api

13e812c

jking-aus previously approved these changes Dec 6, 2024

View reviewed changes

jking-aus linked an issue Dec 6, 2024 that may be closed by this pull request

Implement Anchor Processor #41

Closed

dknopik added 3 commits December 6, 2024 08:10

Merge branch 'unstable' into processor

758c682

cargo fmt

c7a21b8

mark senders as unused for now

70a0055

dknopik dismissed jking-aus’s stale review via 70a0055 December 6, 2024 07:13

dknopik marked this pull request as ready for review December 6, 2024 07:44

jking-aus approved these changes Dec 9, 2024

View reviewed changes

jking-aus merged commit 04176ae into sigp:unstable Dec 9, 2024
8 checks passed

dknopik deleted the processor branch February 5, 2025 14:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Anchor Processor #57

Add Anchor Processor #57

dknopik commented Nov 27, 2024

Zacholme7 commented Nov 27, 2024 •

edited

Loading

dknopik commented Nov 27, 2024

Zacholme7 commented Nov 27, 2024 •

edited

Loading

eserilev commented Nov 27, 2024

dknopik commented Nov 28, 2024

jking-aus commented Nov 29, 2024

AgeManning left a comment

dknopik commented Dec 3, 2024

jking-aus left a comment

jking-aus Dec 6, 2024

dknopik commented Dec 6, 2024

jking-aus commented Dec 9, 2024

jking-aus left a comment

Add Anchor Processor #57

Add Anchor Processor #57

Conversation

dknopik commented Nov 27, 2024

Issue Addressed

Proposed Changes

Additional Info

Zacholme7 commented Nov 27, 2024 • edited Loading

dknopik commented Nov 27, 2024

Zacholme7 commented Nov 27, 2024 • edited Loading

eserilev commented Nov 27, 2024

dknopik commented Nov 28, 2024

jking-aus commented Nov 29, 2024

AgeManning left a comment

Choose a reason for hiding this comment

dknopik commented Dec 3, 2024

jking-aus left a comment

Choose a reason for hiding this comment

jking-aus Dec 6, 2024

Choose a reason for hiding this comment

dknopik commented Dec 6, 2024

jking-aus commented Dec 9, 2024

jking-aus left a comment

Choose a reason for hiding this comment

Zacholme7 commented Nov 27, 2024 •

edited

Loading

Zacholme7 commented Nov 27, 2024 •

edited

Loading