Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix initialization of DataStorm samples after session recovery #3294

Merged
merged 7 commits into from
Dec 30, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 28 additions & 19 deletions cpp/src/DataStorm/SessionI.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1134,28 +1134,37 @@ SessionI::subscriberInitialized(
out << _id << ": initialized '" << element << "' from 'e" << elementId << '@' << topicId << "'";
}
elementSubscriber->initialized = true;
elementSubscriber->lastId = samples.empty() ? 0 : samples.back().id;

vector<shared_ptr<Sample>> samplesI;
samplesI.reserve(samples.size());
auto sampleFactory = element->getTopic()->getSampleFactory();
auto keyFactory = element->getTopic()->getKeyFactory();
for (const auto& sample : samples)
if (samples.empty())
{
assert((!key && !sample.keyValue.empty()) || key == subscriber.keys[sample.keyId].first);

samplesI.push_back(sampleFactory->create(
_id,
elementSubscribers->name,
sample.id,
sample.event,
key ? key : keyFactory->decode(_instance->getCommunicator(), sample.keyValue),
subscriber.tags[sample.tag],
sample.value,
sample.timestamp));
assert(samplesI.back()->key);
return {};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the original code, when samples.empty(), we set elementSubscriber->lastId to 0.

It's not immediately clear why we don't need that. Is this lastId already 0 for some other reason?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment to explain the logic of lastId.

lastId is default initialized to 0 in Session.h

// The ID of the last processed sample.
std::int64_t lastId{0};

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, and the fix in this PR is exactly that: to not set lastId to 0 when samples is empty?

Copy link
Member Author

@pepone pepone Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix is to not reset to 0 when the subscriber is initialized after recovery.

The subscriber received some samples and lastId is updated accordingly.

Then the Session is lost, when it reconnects subscriberInitialized is called again.

If the new call sent no samples, because there were no new samples since the recovery, the previous code was reseting lastId to 0. (that is the bug).

Now if session is lost again, the next recovery would tell the peer that the lastId it saw is 0, and the peer would send all queues elements. That is what was happening with the test failure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed an additional test that allows reproducing the initial issue.

}
else
{
assert(samples.back().id > elementSubscriber->lastId);
elementSubscriber->lastId = samples.back().id;

vector<shared_ptr<Sample>> samplesI;
samplesI.reserve(samples.size());
auto sampleFactory = element->getTopic()->getSampleFactory();
auto keyFactory = element->getTopic()->getKeyFactory();
for (const auto& sample : samples)
{
assert((!key && !sample.keyValue.empty()) || key == subscriber.keys[sample.keyId].first);

samplesI.push_back(sampleFactory->create(
_id,
elementSubscribers->name,
sample.id,
sample.event,
key ? key : keyFactory->decode(_instance->getCommunicator(), sample.keyValue),
subscriber.tags[sample.tag],
sample.value,
sample.timestamp));
assert(samplesI.back()->key);
}
return samplesI;
}
return samplesI;
}

void
Expand Down
Loading