-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix initialization of DataStorm samples after session recovery #3294
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me where the fix is.
sample.value, | ||
sample.timestamp)); | ||
assert(samplesI.back()->key); | ||
return {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the original code, when samples.empty(), we set elementSubscriber->lastId to 0.
It's not immediately clear why we don't need that. Is this lastId already 0 for some other reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment to explain the logic of lastId
.
lastId is default initialized to 0 in Session.h
// The ID of the last processed sample.
std::int64_t lastId{0};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, and the fix in this PR is exactly that: to not set lastId to 0 when samples is empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix is to not reset to 0 when the subscriber is initialized after recovery.
The subscriber received some samples and lastId is updated accordingly.
Then the Session is lost, when it reconnects subscriberInitialized is called again.
If the new call sent no samples, because there were no new samples since the recovery, the previous code was reseting lastId to 0. (that is the bug).
Now if session is lost again, the next recovery would tell the peer that the lastId it saw is 0, and the peer would send all queues elements. That is what was happening with the test failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed an additional test that allows reproducing the initial issue.
while (!connection) | ||
{ | ||
this_thread::sleep_for(chrono::milliseconds(10)); | ||
connection = node.getSessionConnection(session); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This functions sometimes returns nullptr
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it returns nullptr when session is disconnected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the while loop required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might be able to remove it. The idea was that the session might be recovering from a previous close connection. But here seems there is always a connection.
|
||
// Session was reestablish close again | ||
connection = node.getSessionConnection(session); | ||
while (!connection) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question here. Shouldn't the connection exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes fixed
Co-authored-by: Joe George <[email protected]>
Co-authored-by: Joe George <[email protected]>
Fix #3056