Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Resource created with reconcile policy skip and updated to manage keeps AzureResourceNotFound status until long running operation finishes #3788

Closed
nojnhuh opened this issue Feb 15, 2024 · 2 comments · Fixed by #3834
Assignees
Labels
bug 🪲 Something isn't working
Milestone

Comments

@nojnhuh
Copy link
Member

nojnhuh commented Feb 15, 2024

Version of Azure Service Operator

v2.5.0

Describe the bug
Creating a resource with reconcile-policy: skip and then updating that to manage while the resource's Ready condition is false because of AzureResourceNotFound does not transition the reason to Reconciling when the ensuing PUT begins. Creating the same resource with reconcile-policy: manage from the start shows a Reconciling reason for the duration of the operation like I would expect.

To Reproduce
Steps to reproduce the behavior:

  1. Apply the following resources:
apiVersion: resources.azure.com/v1api20200601
kind: ResourceGroup
metadata:
  name: jon-test
spec:
  location: eastus
---
apiVersion: containerservice.azure.com/v1api20231001
kind: ManagedCluster
metadata:
  annotations:
    serviceoperator.azure.com/reconcile-policy: skip
  name: jon-test
spec:
  agentPoolProfiles:
    - vmSize: standard_d2_v2
      name: pool0
      count: 1
      mode: System
  dnsPrefix: jon-test
  identity:
    type: SystemAssigned
  location: eastus
  owner:
    name: jon-test
  servicePrincipalProfile:
    clientId: msi
  1. Observe the ManagedCluster's Ready condition fail with reason: AzureResourceNotFound (as expected)
  2. Update the reconcile policy to manage:
kubectl patch managedcluster jon-test --type merge -p '{"metadata": {"annotations": {"serviceoperator.azure.com/reconcile-policy": "manage"}}}'
  1. ASO starts a PUT to create the resource, but does not update the reason for the Ready condition to Reconciling

Expected behavior

Immediately after step 3, I expect the ManagedCluster's Ready condition to still be false but with reason: Reconciling to reflect that ASO has successfully begun to create the resource. This is the behavior I observe when creating the ManagedCluster with the manage reconcile policy straight away.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context

CAPZ is basing some logic on the reason for the Ready condition, treating Reconciling as less severe than AzureResourceNotFound. This doesn't break any functionality for us, it only leads to some overly negative status reporting on CAPZ resources.

@nojnhuh
Copy link
Member Author

nojnhuh commented Feb 15, 2024

My best guess is that this has to do with the logic for overwriting conditions:

// SetConditionReasonAware sets the provided Condition on the Conditioner. This is similar to SetCondition
// with one difference: SetConditionReasonAware understands common Reasons used by ASO and allows some of them to
// modify the standard Condition priority rules. This is primarily used to allow the Reconciling condition to overwrite
// Warning conditions raised by the operator that have been fixed. This is useful because sometimes getting a success or
// error from Azure can take a long time, and workflows like: submit -> warning -> fix warning -> call Azure -> wait -> success
// otherwise would continue reporting the Warning Condition until the final success step (possibly many minutes after the
// warning was resolved).
func SetConditionReasonAware(o Conditioner, new Condition) {

@matthchr
Copy link
Member

Agree w/ your guess @nojnhuh. We'll take a look at fixing this

@matthchr matthchr self-assigned this Feb 16, 2024
@matthchr matthchr added this to the v2.6.0 milestone Feb 16, 2024
@matthchr matthchr added the bug 🪲 Something isn't working label Feb 22, 2024
@matthchr matthchr modified the milestones: v2.6.0, v2.7.0 Feb 22, 2024
matthchr added a commit to matthchr/azure-service-operator that referenced this issue Feb 28, 2024
This fixes Azure#3788.

AzureResourceNotFound only comes up when ReconcilePolicy is skip. This
conditions priority being less than Reconciling allows skip -> reconcile
to immediately update the condition to Reconciling rather than continuing to
report AzureResourceNotFound until the resource is created.
github-merge-queue bot pushed a commit that referenced this issue Mar 5, 2024
This fixes #3788.

AzureResourceNotFound only comes up when ReconcilePolicy is skip. This
conditions priority being less than Reconciling allows skip -> reconcile
to immediately update the condition to Reconciling rather than continuing to
report AzureResourceNotFound until the resource is created.
@github-project-automation github-project-automation bot moved this from Backlog to Recently Completed in Azure Service Operator Roadmap Mar 5, 2024
@matthchr matthchr moved this from Recently Completed to Ready for Release in Azure Service Operator Roadmap Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🪲 Something isn't working
Projects
Development

Successfully merging a pull request may close this issue.

2 participants