Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Resolve errors for vm resource not found #633

Merged
merged 4 commits into from
Jan 8, 2025

Conversation

tangyouzzz
Copy link
Contributor

@tangyouzzz tangyouzzz commented Dec 24, 2024

Fixes #525

Description
https://learn.microsoft.com/en-us/azure/azure-resource-manager/troubleshooting/error-not-found?tabs=bicep#symptoms
When using the GET method to obtain a VM, the Error Code returned when the VM does not exist can be either NotFound or ResourceNotFound.

{"level":"ERROR","time":"2024-12-24T08:08:51.172Z","logger":"controller","message":"Reconciler error","commit":"56aa2e0-dirty","controller":"node.termination","controllerGroup":"","controllerKind":"Node","Node":{"name":"aks-cloud-fip-mcu-arm64-mkr2f"},"namespace":"","name":"aks-cloud-fip-mcu-arm64-mkr2f","reconcileID":"2ee3a64a-083f-4743-84fa-661820165f2c","error":"ensuring instance termination, getting cloudprovider instance, getting instance, failed to get VM instance, GET https://management.azure.com/subscriptions/xxxx/resourceGroups/xxxx/providers/Microsoft.Compute/virtualMachines/aks-cloud-fip-mcu-arm64-mkr2f\n--------------------------------------------------------------------------------\nRESPONSE 404: 404 Not Found\nERROR CODE: NotFound\n--------------------------------------------------------------------------------\n{\n  \"error\": {\n    \"code\": \"NotFound\",\n    \"message\": \"The entity was not found in this Azure location.\"\n  }\n}\n--------------------------------------------------------------------------------\n"}

In IsNotFoundErr, only the situation of ResourceNotFound will be judged, which will cause the node to remain in the notready state and cannot be deleted.

func IsNotFoundErr(err error) bool {
	azErr := IsResponseError(err)
	return azErr != nil && azErr.ErrorCode == ResourceNotFound
}

How was this change tested?

  • Using az vm simulate-eviction to simulate spot node release

Does this change impact docs?

  • Yes, PR includes docs updates
  • Yes, issue opened: #
  • No

Release Note


@tangyouzzz
Copy link
Contributor Author

@microsoft-github-policy-service agree

Copy link
Collaborator

@Bryce-Soghigian Bryce-Soghigian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @tallaxes take a look for second approval?

@Bryce-Soghigian
Copy link
Collaborator

https://github.com/Azure/azure-sdk-for-go-extensions/blob/main/pkg/errors/armerrors.go#L34 can we also add this change here too?

@coveralls
Copy link

coveralls commented Jan 7, 2025

Pull Request Test Coverage Report for Build 12678035442

Details

  • 4 of 4 (100.0%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.001%) to 95.425%

Totals Coverage Status
Change from base Build 12661198057: 0.001%
Covered Lines: 48136
Relevant Lines: 50444

💛 - Coveralls

Copy link
Collaborator

@tallaxes tallaxes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - and thank you @tangyouzzz for the fix! Some possible improvements:

  • Checking for azErr.StatusCode == http.StatusNotFound instead should catch both
  • Same logic should be applied deleting other resources (like NIC); other resource providers are very likely to exhibit the same pattern
  • A helper method would be better to avoid code duplication (and yes, would be good to update azure-sdk-for-go-extensions)

@tallaxes tallaxes added the area/error-handling Issues or PRs related to handling of errors label Jan 7, 2025
@tallaxes
Copy link
Collaborator

tallaxes commented Jan 8, 2025

Merging since this fixes an actual problem, but let's follow-up on feedback

@tallaxes tallaxes merged commit a4c40b0 into Azure:main Jan 8, 2025
11 checks passed
tallaxes added a commit that referenced this pull request Jan 23, 2025
tallaxes added a commit that referenced this pull request Jan 23, 2025
tallaxes added a commit that referenced this pull request Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/error-handling Issues or PRs related to handling of errors
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NodeClaims stuck deleting due to finalizer
4 participants