-
Notifications
You must be signed in to change notification settings - Fork 99
Deprovisioning of sql-database fails when namespace is deleted #678
Comments
It looks a similar issue as #672 from Azure Go SDK side. We bumped the SDK version as they suggested. Have you upgraded to the latest OSBA release? |
I can give that a try. Is there any documentation on how to safely upgrade OSBA? |
Helm can upgrade and rollback. Or, if you mean you concern about OSBA behaviors after upgrading, maybe you can verify it on a test cluster or minikube first? |
I've upgraded OSBA to 1.5.0 and reinstalled the Catalog-Service from scratch, since it was in an unrecoverable state after the last attempt. Now when the namespace is deleted, the running svcat against the instances I get:
and against the bindings:
|
Any OSBA logs for unbinding? Let's check whether the issue is in OSBA unbinding or svcat side. Tracing back from https://github.com/kubernetes-incubator/service-catalog/blob/7fec2384506143b88910f575913f5fdbe1601d7f/pkg/controller/controller_binding.go#L796, I think it is possible that the unbinding request didn't reach OSBA. |
From what I was able to see while monitoring both the OSBA logs and the service-catalog ones, service-catalog is failing and OSBA logs show nothing while it's failing to deprovision. From the controller-manager logs, I see that initially it's attempting to delete the secrets associated to the db, which results in an initial 404. Remember that the trigger for all of this was the deletion of the entire namespace. It continues to retry the entire process over and over again until it ends up crashing the catalog-catalog-apiserver continuously, goes into a crash loop of retries and ultimately restarts. I've seen it go up to hundreds of restarts. For which the only way to stop it was to just reinstall the service-catalog with Helm again. The result seems to be a deadlock, of retrying to delete the secret which no longer exists. This leaves the Instance in |
I suppose the |
Do you know of any way to reconnect the broker to those instances if service-catalog is reinstalled? I've been forced to reinstall frequently, given that provisioning fails and service-catalog starts crashing and still can't recover. I must be doing something wrong, because at the moment this strategy of provisioning and deprovisioning automatically through OSBA seems to be unstable and not very reliable :(. I just can't identify the problem. Once provisioning fails there's no way to fix it, such as by forcing the removal of bindings and instances from the service-catalog. Even if I manually remove the azure resources that were provisioned. |
Did you try just re-register OSBA by I kinda understand your feelings... Still I should say that your case is in a mixed situations... If it didn't the Azure Go SDK issue and svcat is healthy, OSBA can well handle common provision failures with svcat auto calling deprovision. For manually removing azure resources, please try one more step -- also removing the related records in the OSBA store (the redis). |
We are currently creating on-demand environments for pull-requests and have started provisioning DBs for them using OSBA.
We create a new k8s namespace and provision the database there.
The creation works as expected. However, when we later delete the entire namespace, to destroy the whole on-demand environment, OSBA does not deprovision and delete the database.
It fails with the following error:
time="2019-02-22T18:33:22Z" level=error msg="error executing job; not submitting any follow-up tasks" error="error executing deprovisioning step \"deleteARMDeployment\" for instance \"ff5e0f6e-3531-11e9-b9d6-e68df5afd861\": error executing deprovisioning step: error deleting ARM deployment: error deleting deployment \"1d787c49-7d99-419d-8b43-30647eeabece\" from resource group \"<<redacted>>\": pollingTrackerBase#updateRawBody: failed to unmarshal response body: StatusCode=0 -- Original Error: unexpected end of JSON input" job=executeDeprovisioningStep taskID=f2137f1b-fa76-4251-b669-f68423ba5ac4
Running:
svcat get instances -n pr-NNN
In fact it seems to have left our namespace in a broken state of eternal "Terminating".
Also, manually deprovisioning through svcat did not delete the database either:
But the database remains there, even after waiting a long enough period of time. No messages in the OSBA logs either.
The text was updated successfully, but these errors were encountered: