-
Notifications
You must be signed in to change notification settings - Fork 172
RaftCore - mpsc -> StateMachine #1336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
👋 Thanks for opening this issue! Get help or engage by:
|
@schreter Let's talk about it here? |
Actually, we already have an issue talking about changing replication and applying to streams here: #1184. The current Maybe the minimal change would be the following:
Opinions? |
You're right about the way it works currently, but I don't fully understand the proposal you've described for the new approach to working with the state machine worker and streaming. In my opinion, we can't have more than one streams connected to the state machine worker because there is only one task inside it, and subscribing to two streams doesn't provide any guarantees about the order of events received. This lack of ordering guarantees could be problematic. I believe the best approach would be to have only one stream connected to the state machine worker at any given time. The approach should work like this: RaftCore sends a stream to the state machine worker, and when the state machine worker receives this stream, it pauses the main stream and switches to the apply stream until that stream is closed. This way, if the apply stream is in use, the main stream is suspended. When another command (such as GetSnapshot or something else) needs to be sent to the state machine worker, the apply stream should be shut down first, and then the state machine worker goes back to draining the main stream after processing all events in the apply stream. And I'm not very sure about in your design how the state machine worker and the |
I finally got to do some experiments on how the streaming API could look like. Unfortunately, none of them were to my satisfaction. So let's look at the root issues we have and how we can solve it easier: The current apply task alternates between reading the log and then applying entries to the state machine and awaits the respective operation. This means that these two operations basically block each other, so the apply cannot progress while reading the log and vice versa. For example, if we have an operation which potentially takes longer (needs to load a larger object into memory to be modified and written back), we cannot start further operations past this batch in an overlapped way and let that operation complete after the batch is done. For write-only operations, we cannot even return before awaiting the write of all operations in the batch before returning, otherwise we might report a wrong last-applied index to the Raft core, which might purge logs which may still be needed after the crash. Again, the proper solution would be a stream-oriented interface for reading the log and writing data to the state machine and connecting these streams somehow, so the state machine can execute requests overlapped and report on operation completion via some callback. However, initially, it would be possible to modify the There are two complications here compared to log append:
With this, it is also possible to write a default implementation that delegates to the legacy method and then sends results and calls the callback. In the next step, we can take a look at reading the log, since even with this concept, it would still be awaiting batches instead of overlapping I/O for multiple batches. What do you think? |
It seems the state machine needs a unified stream API: all requests (including applying log entries or building snapshots) should flow through a single stream/channel. This approach satisfies the fundamental requirement of serializing all requests. enum Request {
Apply{entry: LogEntry, response_tx: Sender<_>},
BuildSnapshot{ tx: Sender<Snapshot>},
// ...
}
trait RaftStateMachine {
async fn handle_request(Request);
} What about this? |
Yes, from API perspective, that's also a good option. Basically, that will boil down to implementing SM worker by the user instead of directly in |
Right. Otherwise users would still need to manage the ordering between applying log entries from one stream and building snapshots that arrive through other channels. This creates essentially the same complexity as handling all requests in a single stream... |
OK, so I assume, you first want to experiment a bit how such an API could be implemented? I'll also try to do some more experiments next week (hopefully). Basically, it means remapping of the stream coming from the Raft core to the SM to a stream of commands for the SM and instead of looping internally, just pass the mapped requests to the SM directly. I.e., move the implementation of the SM worker adapted to work on top of the mapped requests as the default-implemented The log reader would be then also directly accessed in the new worker method, i.e., the implementation can then overlap log reading as well. |
Exactly, there would be a new version RaftStateMachine trait to do this 🤔 |
I pushed a proposal with (working) basis for discussion in #1355. |
There should be a sending end for the mpsc sender to push new
(entry, reply_sender)
in to the channel.I assume the pseudo code for this part looks like:
If I understand it correctly, at position
(1)
, the handle has to hold a lock on theRaftCore.client_resp_channels
to get reply sender in another task. right?The RaftCore need applied ID to
Originally posted by @schreter in #1334 (review)
The text was updated successfully, but these errors were encountered: