-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix tests related to probe specs for controller-manager pods - excepting cnf-app-mac-operator which is already fixed #32
Conversation
check dallas ocp-4.14-vanilla example-cnf |
What worked was the teardown of example-cnf, I'll try with a fresh installation because it looks like the deployment of the operators is not working fine, but I don't know if that's because of my change or if it's because the teardown-reinstallation process |
check dallas ocp-4.14-vanilla example-cnf |
check dallas ocp-4.13-vanilla example-cnf |
I'm trying to test this by re-deploying example-cnf in a running cluster but it's not working in any way.
The only way of checking this properly is to redeploy this in a fresh cluster, but all attempts failed in the previous executions in the OCP installation... I'll have to retry again. |
check dallas ocp-4.13-vanilla example-cnf |
It's possible there was an issue with that catalog. But the example-cnf operators are coming from their own catalog: https://www.distributed-ci.io/jobs/e726fa50-fc4a-4687-96ee-7a810b68958f/jobStates?sort=date&task=a9ca950c-8aff-4023-8324-86f1d40d69cd {
"displayName": "NFV Example CNF Catalog",
"image": "quay.io/rh-nfv-int/nfv-example-cnf-catalog@sha256:8d131fc02f53bf67fd40d510ba265adaf66703d6efad3f91750e8c430f2c7ddb",
"publisher": "Red Hat",
"sourceType": "grpc",
"updateStrategy": {
"registryPoll": {
"interval": "30m"
}
}
}
--- snip ---
{
"connectionState": {
"address": "nfv-example-cnf-catalog.openshift-marketplace.svc:50051",
"lastConnect": "2023-12-21T15:07:38Z",
"lastObservedState": "READY"
},
"registryService": {
"createdAt": "2023-12-21T15:07:14Z",
"port": "50051",
"protocol": "grpc",
"serviceName": "nfv-example-cnf-catalog",
"serviceNamespace": "openshift-marketplace"
}
}
--- snip ---
{
"creationTimestamp": "2023-12-21T15:07:14Z",
"generation": 1,
"name": "nfv-example-cnf-catalog",
"namespace": "openshift-marketplace",
"resourceVersion": "2674961",
"uid": "442dc75a-a31e-48bb-a0ff-211c5b0fb324"
} The catalog image quay.io/rh-nfv-int/nfv-example-cnf-catalog@sha256:8d131fc02f53bf67fd40d510ba265adaf66703d6efad3f91750e8c430f2c7ddb matches to the catalog built in this PR: https://github.com/openshift-kni/example-cnf/actions/runs/7277032812/job/19828159075#step:4:928. At least there's guarantee that the correct catalog is being used. 😸 Let's give it another try |
I think I know why this was failing, and it's not because of the catalog source. I finally make the CSV deploy the pod, but it was in CrashLoopBackOff status, and tldr. the first pod created, which was the controller-manager for cnf-app-mac-operator, didn't have /bin/sh in the $PATH, so the command failed, and as the lifecycle/preStart failed, the pod was not able to move to running status. As I know this requires more investigation, and I know the liveness/readiness probes are quickly to check, I'll just check that feature in this change and validate that, and I'll check the lifecycle stuff in a separate change. Testing it now. |
I've retried and it's still not creating the pods, still stuck in CrashLoopBackOff... I'll need to take some more time to check this. I'll move this to WIP and continue next year. |
check dallas ocp-4.14-vanilla example-cnf |
1 similar comment
check dallas ocp-4.14-vanilla example-cnf |
In this job, readiness/liveness/startup probe tests are passing for all *-controller-manager pods, then we just need to fix the other cases. For them, a webserver must be included to start handling these cases. |
I decided to move this change to "ready for review", because of the following:
Consequently, I prefer to merge this change, which we know it's working, and target the next steps in new PRs |
build-depends: rh-nfv-int/nfv-example-cnf-deploy#44
This change intends to fix the following tnf tests for all controller-manager pods:
lifecycle-liveness-probe
lifecycle-readiness-probe
lifecycle-startup-probe
Including the missing content, using tnf_test_example as source of information.
For the moment, we're just testing the Ansible operators based on operator-sdk. For these cases, they only have liveness and readiness probes implemented natively, so for startup probe, we're using right now the same endpoint than liveness probe: https://github.com/operator-framework/operator-sdk/pull/4326/files. The final fix for this will be to implement startup probe in operator-sdk, but this may take some time till having a new release of operator-sdk, so doing this in the meantime.
The other pods under test directly runs commands or scripts. For them, a webserver will be developed, running in background. This is not included in this change.