Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DSC, DSCI: add validating webhook (#711) #257

Closed
wants to merge 1 commit into from

Conversation

ykaliuta
Copy link

@ykaliuta ykaliuta commented May 2, 2024

This patch reverts

5288015 ("Revert "DSC, DSCI: add validating webhook (opendatahub-io#711)"")

Tested:

  • deployed catalog
  • installed operator
  • created DSCI
  • created another DSCI with another name, got message that it's prevented

Copy link

openshift-ci bot commented Jun 14, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ykaliuta

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Jira: https://issues.redhat.com/browse/RHOAIENG-4268

This patch reverts

5288015 ("Revert "DSC, DSCI: add validating webhook (opendatahub-io#711)"")

* webhook: add initial skeleton

Originally it was generated with

```operator-sdk create webhook --group datasciencecluster --version v1 --kind DataScienceCluster  --programmatic-validation```

but webhook.Validator interface (like described in the kubebuilder
book[1]) does not work well for the purpose of the webhook due to
needs to access openshift cluster (client.Client) to check existing
instances of DSC.

So, direct implementation of Handler was done inspired by [2] and
odh-notebooks implementation [3].

Move it from api package closer to controllers as in [3] as well
since it's not DataScienceCluster or DSCInitialization extention
anymore. Amend webhook_suite_test.go's path to configs accordingly.

Fix linter issues in webhook_suite_test.go:
- disable ssl check;
- move to package webhook_test

certmanager files removed too due to usage of OpenShift service
serving certificates[4] (see also
service.beta.openshift.io/inject-cabundle annotation in
config/webhook/kustomization.yaml).

Add webhook generation to `make manifests` target so
webhook/manifests.yaml is generated with it.

Since DSCI creation now requires webhook it should be delayed after
manager started. Move it to a closure and add it to the manager for
run with Add() API. It requires explicit declaration of the
interface variable otherwise complains about type mismatch for the
function literal.

[1] https://book.kubebuilder.io/cronjob-tutorial/webhook-implementation
[2] https://book-v1.book.kubebuilder.io/beyond_basics/sample_webhook.html
[3] https://github.com/opendatahub-io/kubeflow/blob/v1.7-branch/components/odh-notebook-controller/controllers/notebook_webhook.go
[4] https://docs.openshift.com/container-platform/4.9/security/certificates/service-serving-certificate.html

Signed-off-by: Yauheni Kaliuta <[email protected]>

* webhook: implement one instance enforcing

The webhook is written with the idea to handle both Create and
Update requests (configured in config/webhook/manifests.yaml), but
at the moment only duplication check on Create is implemented.

Implements the logic which is done now on reconcile time [1] (same
for DSCI).

It checks for 0 instances, not 1, since when the webhook is running
the object has not been created yet. Means if it's 1 then it handles
request to create a second one.

It could be probably possible to use generics but does not make a
lot of sense for such a simple case.

Closes: opendatahub-io#693

[1] https://github.com/opendatahub-io/opendatahub-operator/blob/incubation/controllers/datasciencecluster/datasciencecluster_controller.go#L98

Signed-off-by: Yauheni Kaliuta <[email protected]>

* tests: add tests to check duplication blocking

Add both envtest and e2e tests of a second DataScienceCluster
instance creation blocking.

envtest's one is a part of webhook test suite.

e2e:

Add `name` parameter to setupDSCInstance() function to reuse it.

Use require.Error() as the assertion, shorter and more straight
logic than implementing it in the test itself.

Add e2e test to check DSCInitialization similar way.

Signed-off-by: Yauheni Kaliuta <[email protected]>

* tests: e2e: refactor duplication tests in more abstract way

Factor out common code using Unstructured/List objects.

Change structure to remind more prepare/action/assert.

Use "require" features when appropriate.

Signed-off-by: Yauheni Kaliuta <[email protected]>

---------

Signed-off-by: Yauheni Kaliuta <[email protected]>

chore(webhook): (opendatahub-io#870)

- add testcase on DSCI
- remove kubebuilder marker not needed
- remove checks on instance number in existing controllers
- re-generate bundle
- we do not act on update but we keep it on webhook for now

Signed-off-by: Wen Zhou <[email protected]>

fix uncommented tests/e2e/dsc_creation_test.go with a line from
9be146f ("chore(lint): updates to latest version (opendatahub-io#1074)")

Signed-off-by: Yauheni Kaliuta <[email protected]>
@ykaliuta
Copy link
Author

ykaliuta commented Aug 2, 2024

/retest-required

1 similar comment
@ykaliuta
Copy link
Author

ykaliuta commented Aug 2, 2024

/retest-required

@ykaliuta
Copy link
Author

ykaliuta commented Aug 2, 2024

/test rhods-operator-e2e

2 similar comments
@ykaliuta
Copy link
Author

ykaliuta commented Aug 3, 2024

/test rhods-operator-e2e

@ykaliuta
Copy link
Author

ykaliuta commented Aug 4, 2024

/test rhods-operator-e2e

@ykaliuta
Copy link
Author

ykaliuta commented Aug 5, 2024

I cannot make it pass with my local setup either, but I have capacity problem:

4m8s                    Warning   FailedScheduling                  Pod/kueue-controller-manager-54556ffb7f-2fqjc                              0/2 nodes are available: 2 Insufficient cpu. preemption: 0/2 nodes are available: 2 No preemption victims found for incoming pod..
4m7s                    Warning   FailedScheduling                  Pod/codeflare-operator-manager-5bb9f6745c-vq4gg                            0/2 nodes are available: 2 Insufficient cpu. preemption: 0/2 nodes are available: 2 No preemption victims found for incoming pod..

@ykaliuta
Copy link
Author

ykaliuta commented Aug 5, 2024

/test rhods-operator-e2e

@ykaliuta
Copy link
Author

ykaliuta commented Aug 5, 2024

it also requires bundle changes for CI builds

@ykaliuta
Copy link
Author

/retest

Copy link

openshift-ci bot commented Aug 12, 2024

@ykaliuta: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/rhoai-operator-pr-image-mirror 378515b link true /test rhoai-operator-pr-image-mirror
ci/prow/rhoai-operator-e2e 378515b link true /test rhoai-operator-e2e
ci/prow/rhods-operator-e2e ee1690d link true /test rhods-operator-e2e

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@ykaliuta
Copy link
Author

I'll recreate the PR. This 3 failing tests look pretty weird and I cannot reproduce failure with the similar messages.

@ykaliuta ykaliuta closed this Aug 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants