Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade test scenario "Fail when there is operator manifest is not applicable" runs for more than 15 minutes #1237

Closed
2 tasks done
triffer opened this issue Aug 9, 2024 · 0 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@triffer
Copy link
Collaborator

triffer commented Aug 9, 2024

Description
The scenario Fail when there is operator manifest is not applicable runs into the max retry timeout. The reason for this is, that an api-gateway-controller-deployment is applied that will fail on update with the following error:

Deployment.apps "api-gateway-controller-manager" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"control-plane":"controller-manager"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable

The update is invoked in the function upgradeApiGateway. The invocation of s.resourceManager.CreateOrUpdateResourcesGVR(s.k8sClient, manifestCrds...) internally calls the function UpdateResource. This function always uses the default retries to update. Since recently the retry time was increased from 5 to 15 minutes, this unintended behaviour started to surface.

In general we should avoid retries in lower level functions like UpdateResource and retry rather in the scenario function. Nevertheless I don't understand the upgrade scenario and what is tested in his case, since the only step in this scenario doesn't give a meaningful explanation:
Upgrade: API Gateway is upgraded to current branch version with "failing" manifest and should "fail"

I think we should also question whether we need this scenario at all, as we are providing an incorrect yaml that is expected to fail the apply.
If we want to keep it, we should fix the retries and improve the scenario and step to make it easier to understand what is tested.

Expected result
The upgrade integration tests execute in less than 10 minutes.

Actual result
Due to the retries the upgrade integration test runs for ~20 minutes.

Steps to reproduce
Run the upgrade integration tests.

Troubleshooting

Attachments

@triffer triffer added the kind/bug Categorizes issue or PR as related to a bug. label Aug 9, 2024
@videlov videlov self-assigned this Oct 9, 2024
@videlov videlov removed their assignment Oct 10, 2024
@triffer triffer assigned triffer and videlov and unassigned triffer Oct 11, 2024
@triffer triffer assigned triffer and unassigned videlov Oct 23, 2024
@strekm strekm closed this as completed Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

3 participants