Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add oauth-proxy to rawdeployments if odh auth label is present #419

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

VedantMahabaleshwarkar
Copy link

@VedantMahabaleshwarkar VedantMahabaleshwarkar commented Oct 9, 2024

What this PR does / why we need it:

This PR adds the following :

  • If the isvc has label "security.opendatahub.io/enable-auth" = "true"
    -- an oauth proxy container is added to the deployment
    -- http port is replaced by https port in the service
    -- "service.beta.openshift.io/serving-cert-secret-name" is added to the service to allow creation of the tls secret
  • The service account, CRB and Route are created in Add reconciliation for Kserve Raw odh-model-controller#274

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes # https://issues.redhat.com/browse/RHOAIENG-10291, https://issues.redhat.com/browse/RHOAIENG-13444
TEST WITH: opendatahub-io/odh-model-controller#274

Type of changes
Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Feature/Issue validation/testing:

  • Deploy ODH/RHOAI:
  • Install the DSC (kserve spec below) :
    kserve:
      defaultDeploymentMode: RawDeployment
      devFlags:
        manifests:
          - contextDir: config
            sourcePath: overlays/odh
            uri: 'https://github.com/VedantMahabaleshwarkar/kserve/tarball/devflags'
          - contextDir: config
            sourcePath: ''
            uri: 'https://github.com/VedantMahabaleshwarkar/odh-model-controller/tarball/devflags'
      managementState: Managed
      serving:
        ingressGateway:
          certificate:
            type: OpenshiftDefaultIngress
        managementState: Managed
        name: knative-serving
  • To test enabling route for a particular ISVC
    -- Create any kserve isvc+SR
    -- isvc should have annotation : "serving.kserve.io/deploymentMode" : RawDeployment
    -- isvc should have label : "networking.kserve.io/visibility": "enable-route"

  • To test route with auth:
    -- Create any kserve isvc+SR
    -- isvc should have annotation : "serving.kserve.io/deploymentMode" : RawDeployment
    -- isvc should have label: "security.opendatahub.io/enable-auth" = "true"
    -- isvc should have label: "networking.kserve.io/visibility": "enable-route"

  • To test Inference without auth:
    -- remove isvc label "security.opendatahub.io/enable-auth" = "true"

Now inference should work without token

  1. Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

Checklist:

  • Have you added unit/e2e tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?

Release note:

Added oauth-proxy authentication for Kserve RawDeployment Routes

Re-running failed tests

  • /rerun-all - rerun all failed workflows.
  • /rerun-workflow <workflow name> - rerun a specific failed workflow. Only one workflow name can be specified. Multiple /rerun-workflow commands are allowed per comment.

Copy link

openshift-ci bot commented Oct 9, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link

openshift-ci bot commented Oct 9, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: VedantMahabaleshwarkar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [VedantMahabaleshwarkar]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label Oct 9, 2024
@VedantMahabaleshwarkar
Copy link
Author

tests passing locally, the failures look like test infra failures

/retest-required

Makefile Outdated Show resolved Hide resolved
pkg/constants/constants.go Outdated Show resolved Hide resolved
@VedantMahabaleshwarkar
Copy link
Author

/retest-required

@VedantMahabaleshwarkar
Copy link
Author

not sure what's happening with the tests, they are passing locally :(

pkg/apis/serving/v1beta1/openapi_generated.go Outdated Show resolved Hide resolved
pkg/constants/constants.go Outdated Show resolved Hide resolved
config/overlays/odh/inferenceservice-config-patch.yaml Outdated Show resolved Hide resolved
pkg/constants/constants.go Outdated Show resolved Hide resolved
`--cookie-secret=SECRET`,
`--openshift-delegate-urls={"/": {"namespace": "$isvcNamespace", "resource": "services", "verb": "get"}}`,
`--openshift-sar={"namespace": "$isvcNamespace", "resource": "services", "verb": "get"}`,
`--skip-auth-regex="(^/metrics|^/apis/v1beta1/healthz)"`,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the liveness and readiness probes, instead of being fixed, you should inspect the ISVC and extract the configured paths.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get why we should do this. For the modelmesh oauth proxy injection the oauth proxy probes are fixed. Even here the kserve container and the oauth proxy container will report statuses independently so I'm not seeing why we should change the probes here

@VedantMahabaleshwarkar
Copy link
Author

VedantMahabaleshwarkar commented Oct 17, 2024

the e2e raw test failure is because https://github.com/opendatahub-io/kserve/blob/master/test/scripts/openshift-ci/run-e2e-tests.sh needs to be updated with the new behavior. Imo it can be ignored now can will be fixed later in https://issues.redhat.com/browse/RHOAIENG-14604

Makefile Outdated Show resolved Hide resolved
config/overlays/odh/params.env Outdated Show resolved Hide resolved
pkg/apis/serving/v1beta1/openapi_generated.go Outdated Show resolved Hide resolved
`--cookie-secret=SECRET`,
`--openshift-delegate-urls={"/": {"namespace": "` + namespace + `", "resource": "inferenceservices", "group": "serving.kserve.io", "name": "` + isvc + `", "verb": "get"}}`,
`--openshift-sar={"namespace": "` + namespace + `", "resource": "inferenceservices", "group": "serving.kserve.io", "name": "` + isvc + `", "verb": "get"}`,
`--skip-auth-regex="(^/metrics|^/apis/v1beta1/healthz)"`,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following up #419 (comment).

These are not the probes, but paths that would be left unprotected.

Now... I'm thinking that you are right on your comment that Kubernetes is going to do the probing on the container. With that in mind, I'm not sure why we should have unprotected paths. So, maybe, remove the --skip-auth-regex argument?

`--upstream=http://localhost:` + upstreamPort,
`--tls-cert=/etc/tls/private/tls.crt`,
`--tls-key=/etc/tls/private/tls.key`,
`--cookie-secret=SECRET`,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, follow-up my comment of the previous review round: #419 (comment)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get this comment. The tls cert and key is mounted from the secret which is created by the annotation for the service. That should be unique to each deployment right?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TLS args are good. My comment is for the --cookie-secret=SECRET argument.

Comment on lines +156 to +198
ports := service.Spec.Ports
replaced := false
for i, port := range ports {
if port.Port == constants.CommonDefaultHttpPort {
ports[i] = httpsPort
replaced = true
}
}
if !replaced {
ports = append(ports, httpsPort)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From

if len(container.Ports) > 0 {
var servicePort corev1.ServicePort
servicePort = corev1.ServicePort{
Name: container.Ports[0].Name,
Port: constants.CommonDefaultHttpPort,
TargetPort: intstr.IntOrString{
Type: intstr.Int,
IntVal: container.Ports[0].ContainerPort,
},
Protocol: container.Ports[0].Protocol,
}
servicePorts = append(servicePorts, servicePort)
for i := 1; i < len(container.Ports); i++ {
port := container.Ports[i]
if port.Protocol == "" {
port.Protocol = corev1.ProtocolTCP
}
servicePort = corev1.ServicePort{
Name: port.Name,
Port: port.ContainerPort,
TargetPort: intstr.IntOrString{
Type: intstr.Int,
IntVal: port.ContainerPort,
},
Protocol: port.Protocol,
}
servicePorts = append(servicePorts, servicePort)
}
} else {
port, _ := strconv.Atoi(constants.InferenceServiceDefaultHttpPort)
servicePorts = append(servicePorts, corev1.ServicePort{
Name: componentMeta.Name,
Port: constants.CommonDefaultHttpPort,
TargetPort: intstr.IntOrString{
Type: intstr.Int,
IntVal: int32(port), // #nosec G109
},
Protocol: corev1.ProtocolTCP,
})
}
, looks like the CommonDefaultHttpPort entry should be present always, and it is going to be the first entry always. So, I think you can simply mutate service.Spec.Ports[0] and change the target port...

...sorry, I know that I said about being defensive in my previous comment.

@@ -1264,6 +1270,109 @@ var _ = Describe("v1beta1 inference service controller", func() {
Eventually(func() error { return k8sClient.Get(context.TODO(), predictorHPAKey, actualHPA) }, timeout).
Should(HaveOccurred())
})
It("Should have no ingress created if labeled as cluster-local", func() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, contribute this one to upstream?

expectedService := &v1.Service{
ObjectMeta: metav1.ObjectMeta{
Name: predictorServiceKey.Name,
Namespace: predictorServiceKey.Namespace,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add unit tests for the routes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The route logic is checking if the route is admitted and we will only use the route URL if that is the case. Since ODH Model Controller is the one creating the route here shouldn't this be a integ test more than a unit test?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, perhaps you should write an E2E test under test/e2e.

The E2Es are currently failing because of a bad openshift-ci setup. But once the setup is finished (we do have Jiras in the backlog for fixing it), the E2Es will run with odh-model-controller deployed on the testing cluster. In the meanwhile, we have been running E2Es on our machines.

ports := service.Spec.Ports
replaced := false
for i, port := range ports {
if port.Port == constants.CommonDefaultHttpPort {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the CommonDefaultHttpPort is going away, I think you should reflect in the Status of the ISVC the right port to reach.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On this review round, I don't remember seeing a change that would solve this comment.

Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>
Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>
Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>
Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>
Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>
Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>

1

Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>

2

Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>
Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>
Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>
Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>
Co-authored-by: Filippe Spolti <[email protected]>
Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>
Copy link

openshift-ci bot commented Nov 12, 2024

@VedantMahabaleshwarkar: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-raw ef845e2 link true /test e2e-raw
ci/prow/e2e-slow ef845e2 link true /test e2e-slow
ci/prow/e2e-fast ef845e2 link true /test e2e-fast

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

oauthMemoryRequest = str
}
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fallback values are fine, but:

  • All other code in KServe returns an error if there is a failure fetching the ConfigMap.
  • Also, all other code in KServe returns an error if there is an error unmarshalling the JSON string.

So, I'm not seeing a good reason for not following the same pattern as the rest of the KServe code. IMO, we should propagate the errors to the user.

`--upstream=http://localhost:` + upstreamPort,
`--tls-cert=/etc/tls/private/tls.crt`,
`--tls-key=/etc/tls/private/tls.key`,
`--cookie-secret=SECRET`,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TLS args are good. My comment is for the --cookie-secret=SECRET argument.

@@ -19,10 +19,10 @@ package ingress
import (
"context"
"fmt"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removal of the blank line may be causing the linter issue.

routeUrl, err := getRouteURLIfExists(r.client, isvc)
if err == nil && routeUrl != nil {
isvc.Status.URL = routeUrl
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only error that is tolerable to ignore is the not found error, which is taken in account inside getRouteURLIfExists.

For any other error we are expecting to properly report the endpont of the Route. So, I don't see a good reason for ignoring errors, as that can lead to assigning a bad endpoint.

@@ -305,6 +357,12 @@ func (r *RawIngressReconciler) Reconcile(isvc *v1beta1.InferenceService) error {
}
}
isvc.Status.URL, err = createRawURL(isvc, r.ingressConfig)
if val, ok := isvc.Labels[constants.NetworkVisibility]; ok && val == constants.ODHRouteEnabled {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code inside the if is what should be coded in a deterministic way.

The odh-model-controller would be creating the Route, so the order is not guaranteed. There is a chance that kserve-controller finishes reconciliation while odh-model-controller hasn't had the opportunity of creating the Route. That can lead to a not found error when fetching the route, leading to the ISVC reporting the wrong endpoint.

ports := service.Spec.Ports
replaced := false
for i, port := range ports {
if port.Port == constants.CommonDefaultHttpPort {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On this review round, I don't remember seeing a change that would solve this comment.

expectedService := &v1.Service{
ObjectMeta: metav1.ObjectMeta{
Name: predictorServiceKey.Name,
Namespace: predictorServiceKey.Namespace,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, perhaps you should write an E2E test under test/e2e.

The E2Es are currently failing because of a bad openshift-ci setup. But once the setup is finished (we do have Jiras in the backlog for fixing it), the E2Es will run with odh-model-controller deployed on the testing cluster. In the meanwhile, we have been running E2Es on our machines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: New/Backlog
Development

Successfully merging this pull request may close these issues.

6 participants