Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASO does not retry after initial RBAC failure for storage accounts #4459

Closed
nraooptum opened this issue Nov 21, 2024 · 2 comments
Closed

ASO does not retry after initial RBAC failure for storage accounts #4459

nraooptum opened this issue Nov 21, 2024 · 2 comments
Assignees
Labels
bug 🪲 Something isn't working waiting-on-user-response Waiting on more information from the original user before progressing.
Milestone

Comments

@nraooptum
Copy link

nraooptum commented Nov 21, 2024

When attempting to create a storage account with customer managed key encryption, if the user-assigned managed identity of the storage account does not have RBAC to the Key Vault, the operator/Azure correctly errors out for that resource saying that there is a key vault authentication failure. However, at some point the RBAC for the storage account's UAMI (user assigned managed identity) does get reconciled, but the storage account resource continues to stay stuck in this error state. The only way to recover from this is to completely delete the storage account resource and then re-create it (note: kubectl replace does not work -- the resource has to be deleted completely), which correctly deploys the storage account as it no longer gets a key vault authentication failure.

If it helps, the relevant section of my config for the storage account looks like this (it's written in typescript with cdk8s):

Image

@matthchr
Copy link
Member

Can you share the error message from the ASO logs? It seems like we probably ought to be retrying on that (maybe for all resources, or maybe just for the storage accounts API, it will depend on the specific message).

You also said this:

The only way to recover from this is to completely delete the storage account resource and then re-create it (note: kubectl replace does not work -- the resource has to be deleted completely), which correctly deploys the storage account as it no longer gets a key vault authentication failure.

I don't think that this is necessarily true. You'll need to trigger a fresh reconcile on the resource. Delete + recreate can do it, but you also could do something like edit a tag, or add and remove the skip-reconcile annotation.
Obviously none of those are ideal, so if you can share the error message you received here we'll update things so that we correctly retry on this.

@matthchr matthchr added this to the v2.12.0 milestone Nov 25, 2024
@super-harsh super-harsh added the waiting-on-user-response Waiting on more information from the original user before progressing. label Dec 9, 2024
@super-harsh super-harsh self-assigned this Dec 9, 2024
@nraooptum
Copy link
Author

Going to close this -- we ended up using the ArgoCD "waves" feature to ensure RBAC prior to creating the storage account.

@github-project-automation github-project-automation bot moved this from Backlog to Recently Completed in Azure Service Operator Roadmap Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🪲 Something isn't working waiting-on-user-response Waiting on more information from the original user before progressing.
Projects
Status: Recently Completed
Development

No branches or pull requests

3 participants