-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Anchor Processor #57
Conversation
My very initial thoughts
Other than those two high level thoughts I like it. The priority via |
Thank you for looking through the PR :)
This is one of the aspects of the beacon processor it kept in. I like it because it saves us infrastructure (having to register a handler for each queue) and allows us to easily submit work items that are different but belong to the same priority category (=queue).
What do you mean by worker? Basically a thread that continually runs and receives messages from the manager, servicing a single queue? |
Ah okay cool that makes sense. I have not dug too much into the beacon processor.
Say we have a slot where multiple validators each have a duty to fulfill. The duty manager would send a message to the specialized workers which in turn would do any specific processing needed and spin up their own QBFT instances and handle that entire life cycle. The processor here is then used to make sure everything is progressing as intended and the workers are still alive, etc. I suppose here we also have to make the distinction between having workers based on queue priority like is currently done or having workers that specialize in some duty such as attestations. But we would have to somehow work in priority for the latter. Might make sense to just not over engineer and route it all through the processor and have that manage all the QBFT instances. Just food for thought and thinking out loud. |
re: a common base processor module: I've done some work in this PR to generalize aspects of the Lighthouse beacon processor. Mostly to experiment with a different scheduling algorithm. Its still very much a WIP and I'm not sure if these changes would be useful for Anchor, but mentioning it here just in case |
@eserilev thanks! Being able to switch scheduling strategies is definitely valuable. However, your PR does not yet generalize over the possible work items. It motivated me again to think about a good way to do that though, would be nice to come up with something that supports both :) @Zacholme7 ah, got you :) yeah, until now I was thinking about the processor purely as a scheduler for task admission. You are right, having it manage e.g. QBFT instances and "route" QBFT messages to the correct instances might be worth it, so I will explore that as well. Thanks for your input! |
just reviewed -- good start mate and I think it's on the right track. Fair bit of work to come in refining queues, scheduling and prioritisation imo but keen to hear where your head is at. On some of the discussion above:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
I read some of the comments. In terms of routing messages to QBFT instances, we had envisaged each instance being tagged with a unique InstanceId
.
The Processor (or some state attached to it) would build a mapping of <InstanceId, Sender>
and <InstanceId, Receiver>
when creating them.
So we can always just send/recv messages from each instance.
I dont think we need to worry to much about checking if they are running or not. We can have metrics to see how many we have running at any given time, but in general I think its fine to assume they will always complete, either success or failure in at most X time. There is no scenario where a QBFT task cannot end after X time (this saves us checking on them, we can just spawn and forget).
add metrics to simple example
I think we should go with the simple variant for the processor instead of sharing code with Lighthouse for now - Lighthouse's needs are pretty complex, and Anchor's needs not fully known yet, so for time budget reasons it is probably better to start with the non-modularized version and later revisit an implementation using the modular processor @eserilev is cooking up. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good work mate, concise and logical.
I think for todos:
- We should use a more structured type definition for the queues: i.e.
pub Struct QueueConfig {name: str, capacity: usize, permit: bool}
then a vec of queues or something to define the queue type. Would make a few things like monitoring, runtime config and queue creation easier. - Maybe graceful shutdown handling (nice to have)
- Consider resource limits and performance monitoring (nice to have)
/// The [`DropOnFinish`] should be dropped when the work is done, for proper permit accounting | ||
/// and metrics. This includes any work triggered by the closure, so [`DropOnFinish`] should | ||
/// be sent along if any other process such as a QBFT instance is messaged. | ||
pub fn new_immediate(name: &'static str, func: ImmediateFn) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a good addition for state change and queue management
Thanks for the review!
Yeah, that might make sense. I am not sure about the configurability of needing a permit though - This is currently very ingrained into the way I implemented the main
Oh yeah - that's a good idea. I wonder if we need an explicit shutdown queue or if dropping the senders is enough as shutdown signal 🤔 probably not due to the risk of deadlocking?
We have the semaphore as a limit already. What kind of limit do you envision? Also, for monitoring, what do you have in mind in addition to the metrics? |
yep all good - permitless queue could be a separate type and it be fine. Just made sense to me to have it all in one for ease of use, even if it almost always false.
I could be wrong but I think only deadlock risk is blocked tasks with indefinite awaits. For now dropping senders OK but I think we need to handle the blocking tasks more carefully in next ver. Could we just use Notify?
was thinking more for in the future when we need to optimise throughput or check backpressure -- queue latency, processing time, resource usage, etc. Not for this version but an idea for when we've got mvp and are finetuning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Issue Addressed
Proposed Changes
Add a "processor" component, inspired by the beacon processor from lighthouse. It differs from Lighthouses' processor:
Notify
etc instead.Additional Info
I've abandoned efforts to create a common base processor module used by both Lighthouse and Anchor, as this would basically replace glue with other glue. Also, Lighthouse has a lot of specialised behaviour that would be tough to abstract in a common module.
This is only a preliminary version we can use for development - we will of course need to reconsider and/or fine-tune later when we can actually test performance.
This is a draft, stuff like metrics is still missing.