Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replication Repository "Check Connection" can return false positive if mig-controller is dead #1011

Open
djwhatle opened this issue Sep 10, 2020 · 1 comment

Comments

@djwhatle
Copy link
Contributor

djwhatle commented Sep 10, 2020

Problem

mig-controller wasn't running on my cluster, and I was in mig-ui and updated the Replication Repo credentials to some bogus values. I noticed mig-ui returned "connection successful" despite my controller being dead and the credentials being wrong and broken.

I think it's important for the "Check Connection" button to provide accurate status output so the user can have confidence in the button. Right now there seems to be an incorrect assumption baked into Replication Repo "Check Connection" that the controller will reconcile the change in some fixed time period. In reality there is no guarantee on how fast the mig-controller will respond (e.g. when controller is dead, or very busy with other work).

I believe mig-ui can't currently rely on MigStorage.Status.ObservedDigest, since a MigStorage spec change isn't currently involved when changing these secret creds.

Suggested fix

We can use MigStorage.Spec.ObservedDigest to detect when the controller has reacted to the mig-ui Replication Repo credential change if we adopt a workflow like this:

  1. User clicks "Check Connection" or "Update Replication Repo"
  2. mig-ui creates a new secret with the updated creds
  3. mig-ui updates the MigStorage.spec to refer to the new creds secret
  4. mig-ui deletes old creds secret
  5. mig-ui waits for MigStorage.Status.ObservedDigest to match with new spec
  6. mig-ui returns the "Connection Check" / "Update" result to the user, or returns that the operation timed out if controller doesn't update ObservedDigest to match expectation in some set time window
@djwhatle djwhatle changed the title Replication Repository "Check Connection" returns false positive for MigStorage OK Replication Repository "Check Connection" returns false positive "Connection OK" Sep 10, 2020
@djwhatle
Copy link
Contributor Author

To be more clear: this is mostly only an issue if the controller is dead. In other cases it is very likely the migstorage controller doesn't have much work to do and will respond quickly.

Still would be good to fix but not highest priority right now I think.

@djwhatle djwhatle changed the title Replication Repository "Check Connection" returns false positive "Connection OK" Replication Repository "Check Connection" can return false positive if mig-controller is dead Sep 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant