-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Support reconnection/resync with Typha #10306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
ca8aba2
to
a25c0f5
Compare
0106904
to
d599369
Compare
- Add optional callback for reconnection-aware clients. - Adjust Typha discovery to reset after all Typhas have been tried. - Make the dedupe buffer reconnection-aware. It now - Stores off the keys that it had previously seen when it gets the OnTyphaConnectionRestarted() call. - Discards those seen keys as the resync progresses. - Synthesises deletions for KVs that weren't seen during the resync. - Recalculates the UpdateType when sending keys downstream so that the calculation graph sees a resync as a sequence of updates for exisitng keys. - Refactor the client so that it - Does one connecction synchronously (including connection attempts to mutliple Typha instances as before) - Reconnects in the background after a failure. - Sends WaitForDatastore/ResyncInProgress messages when it's doing a reconnection. - Re-uses a single connection attempt tracker so that we cycle through Typha instances on reconnect. - Varous minor changes: - Add "done" channels to various components to avoid "log to testing.T after test finished" errors. - Add 32 bit random value to connection ID. Makes it a lot more greppable in logs.
d599369
to
8a00314
Compare
log.Debugf("Typha supports node resource updates: %v", supportsNodeResourceUpdates) | ||
configParams.SetUseNodeResourceUpdates(supportsNodeResourceUpdates) | ||
// Up-to-date Typha client will refuse to connect unless Typha signals | ||
// that it supports node resource updates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any issue with version skew on-upgrade?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the feature was added in 2019 so, yes, but only if you're skipping a dozen versions. Even in that case, it will only block new felix from talking to old typha, felix will just keep trying untill it connects to an up-level typha.
Description
Add optional callback for reconnection-aware clients.
Adjust Typha discovery to reset after all Typhas have been tried.
Make the dedupe buffer reconnection-aware. It now
Refactor the client so that it
felix_resyncs_started
andfelix_resync_state
Prometheus metrics useful again.Varous minor changes:
Related issues/PRs
CORE-11348
Todos
The client will still bail out if it can't connect to any Typha. Not sure if that's desirable or not; we could keep retrying but issues like running out of file handles might be better handled with a restart.
Release Note
Reminder for the reviewer
Make sure that this PR has the correct labels and milestone set.
Every PR needs one
docs-*
label.docs-pr-required
: This change requires a change to the documentation that has not been completed yet.docs-completed
: This change has all necessary documentation completed.docs-not-required
: This change has no user-facing impact and requires no docs.Every PR needs one
release-note-*
label.release-note-required
: This PR has user-facing changes. Most PRs should have this label.release-note-not-required
: This PR has no user-facing changes.Other optional labels:
cherry-pick-candidate
: This PR should be cherry-picked to an earlier release. For bug fixes only.needs-operator-pr
: This PR is related to install and requires a corresponding change to the operator.