-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed upgrade may lead to an endless loop of rollbacks #224
Comments
This is biting us as well, going into backoff seems like the best solution here no? |
Hey @acornett21, thanks for sharing the link to this issue. |
@kovayur It does look like the |
Problem
When the reconciler fails to upgrade the release it rollbacks to the previous revision and returns an error. The controller runtime is expected to retry the reconciliation with an exponential backoff, but in reality it keeps reconciling over and over again. I was able to reproduce this behavior for the following use cases:
Deployment
is set by bothvalue
andvalueFrom
tags (ROX-18477: operator delete valuesFrom in proxy config if values is set stackrox/stackrox#7105).Every rollback increases the revision count. In my case, the operator spawns thousands of revisions in a matter of minutes.
Root cause
A rolled back revision is no different from the upgraded revision, it has the
deployed
status as after a normal upgrade. There always be a diff between the expected state calculated from the CR and the rolled back revision this will lead to a failed upgrade again and again.There're events that are added in the reconciliation queue aside of the exponential backoff and cause the reconciliation without any delay. These events are:
Irreconcilable
status is updated twice for every reconcile both withFalse
(right before the upgrade) andTrue
(after the upgrade failed).pending-upgrade
pending-rollback
superseded
deployed
orfailed
depending on the rollback result.There is deduplication in the queue, but still at least one event will be queued without delay.
The text was updated successfully, but these errors were encountered: