Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: reconcile-policy:recreate-on-modification-failure #3747

Open
mehighlow opened this issue Jan 29, 2024 · 4 comments
Open

Feature: reconcile-policy:recreate-on-modification-failure #3747

mehighlow opened this issue Jan 29, 2024 · 4 comments

Comments

@mehighlow
Copy link
Contributor

mehighlow commented Jan 29, 2024

Feature: new annotation serviceoperator.azure.com/reconcile-policy:recreate-on-modification-failure that re-creates a resource upon modification failure.

Some resources do not support modifications, such as changes to names, SKUs, etc. For example, you cannot downgrade a RedisDB once it has been scaled up. Similarly, the name of a RoleAssignment can't be updated. Changes to the CRD may result in operation errors. While this behavior might be acceptable for production installations, it would be beneficial to have an opt-in downgrade option for dev/test/staging environments, even though it comes with downtime. During this process, the ASO operator would delete the old resource and create a new one with the desired SKU with the same name.

Imagine manually modifying an existing resource; if the operator detects that progressing towards the goal state results in a 4xx error from Azure (assuming the annotation is set), it would then delete the resource and recreate it with the same name.

@mehighlow mehighlow changed the title Feature: reconcile-policy:recreate-on-fail Feature: reconcile-policy:recreate-on-modification-failure Jan 29, 2024
@theunrepentantgeek
Copy link
Member

Wouldn't this be incredibly dangerous?

A Redis cache doesn't really contain any state (and one could argue that an app should work even if Redis is unavailable, just more slowly), but not every Azure Resource is stateless.

Imagine the consequences if this annotation were applied to a PostgreSQL Database, Storage Queue, etc.

Any design for this needs to carefully consider potential failure modes - last thing we want is to destroy customer data.

@mehighlow
Copy link
Contributor Author

It is very much indeed.

However, it would be an opt-in feature per resource.

@matthchr matthchr added this to the v2.7.0 milestone Jan 29, 2024
@matthchr matthchr removed this from the v2.7.0 milestone Feb 22, 2024
@mehighlow
Copy link
Contributor Author

Here is an example (again) of why this feature may be useful: I'm recreating an environment by first destroying everything—the resource group and its nested resources. However, a role assignment is not an Azure object like a database or storage, so I can't visually confirm in the portal whether it has been deleted completely. When I attempt to recreate the environment with the same name, the role assignment operation fails because the previous one was either not completely wiped out or is still in the process of being deleted.

Message: Tenant ID, application ID, principal ID, and scope are not allowed to be updated.: PUT https://management.azure.com/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/resourcegroup-name/providers/Microsoft.KeyVault/vaults/azure-kv-name/providers/Microsoft.Authorization/roleAssignments/6f07e9af-585f-54ff-8823-5ea9aa1ce3d2

Which makes me to delete the role using Management API which is annoying.

@theunrepentantgeek
Copy link
Member

theunrepentantgeek commented Sep 30, 2024

We're interested in other scenarios where users would find this useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

3 participants