Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated and tested Seldon Core 1.15.0 #2326

Merged
merged 33 commits into from
Dec 1, 2022

Conversation

axsaucedo
Copy link
Contributor

@axsaucedo axsaucedo commented Nov 22, 2022

Signed-off-by: Alejandro Saucedo [email protected]

Which issue is resolved by this Pull Request:

Resolves part of #2311 addressing #2322.

Description of your changes:

  • Adding requirements as per Guidelines for /contrib Components
  • Fully tested kubeflow installation of Seldon Core's latest version 1.15.0
  • Adding UPGRADE.md
  • Adding automated tests (via make -C contrib/seldon/ test)
  • Adding example testing for manual interaction

Testing

A section was added under contrib/seldon/README.md containing the instructions to test, which are below.

It is now possible to run the automated tests with make -C contrib/seldon/ test.

Explicit overview of manual install & testing

Installation can be done with the kustomize command:

kustomize build seldon-core-operator/base | kubectl apply -n kubeflow -f -

We can create a test model once the "Install Seldon Operator" is configured

# Create namespace for model
kubectl create namespace seldon

We can create an echo model with the following command:

kubectl apply -f - << ENDapiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: echo
  namespace: seldon
spec:
  predictors:
  - name: default
    replicas: 1
    graph:
      name: classifier
      type: MODEL
    componentSpecs:
    - spec:
        containers:
        - image: seldonio/echo-model:1.15.0-dev
          name: classifier
END

We can verify that model is running:

kubectl get pods -n seldon

NAME                                         READY   STATUS    RESTARTS   AGE
echo-default-0-classifier-679cb5fb68-qd4nm   2/2     Running   0          25m

Also we can verify that the correct virtualservice was created:

kubectl get virtualservice -n seldon

NAME   GATEWAYS                        HOSTS   AGE
echo   ["kubeflow/kubeflow-gateway"]   ["*"]   42m

Finally we can send a request (you will need to fetch the Dex Auth Token / Cookie):

export CLUSTER_IP=# Your cluster IP
export SESSION=# Your dex session

curl -H "Content-Type: application/json" -H "Cookie: authservice_session=${SESSION}" \
   -d '{"data": {"ndarray":[[1.0, 2.0, 5.0]]}}' \
   http://{CLUSTER_IP}/seldon/seldon/echo/api/v1.0/predictions

{"data":{"names":["t:0","t:1","t:2"],"ndarray":[[1.0,2.0,5.0]]},"meta":{"metrics":[{"key":"mycounter","type":"COUNTER","value":1},{"key":"mygauge","type":"GAUGE","value":100},{"key":"mytimer","type":"TIMER","value":20.2}],"requestPath":{"classifier":"seldonio/echo-model:1.15.0-dev"}}}

@juliusvonkohout
Copy link
Member

@axsaucedo you need to sign the CLA

@axsaucedo
Copy link
Contributor Author

Thank you @juliusvonkohout - I have signed the CLI a while back (the https://cla.developers.google.com/clas confirms I have valid CLA), it seems the job is still "waiting for status", and I believe I saw it pass in a previous commit, is there a way to re-trigger the check?

@juliusvonkohout
Copy link
Member

/retest

Copy link
Member

@annajung annajung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the prompt response in getting the component updated to work with the latest kubeflow release!

contrib/seldon/Makefile Show resolved Hide resolved
contrib/seldon/README.md Show resolved Hide resolved
contrib/seldon/README.md Outdated Show resolved Hide resolved
contrib/seldon/UPGRADE.md Show resolved Hide resolved
@juliusvonkohout
Copy link
Member

I think Seldon also has proper support for PodSecurityStandards restricted. At least that was the case for the previous version. Nevertheless that is something that should also be tested.

@axsaucedo
Copy link
Contributor Author

Thank you for the review @annajung - we'll work through the comments.

@juliusvonkohout thanks for the re-test - do you know why the CLA is not working? In the site this is what I see:

image

@axsaucedo
Copy link
Contributor Author

Ok it seems the CLA is passing now @juliusvonkohout.

image

@annajung thank you for the comments, these should now be resolved - let me know if anything else required.

@axsaucedo
Copy link
Contributor Author

@kimwnasptd ok I have now added the test - for some strange reason I can't seem to get the "Create KinD Cluster" step to pass in the CI, it seems it doesn't find the file tests/gh-actions/kind-cluster-1-22.yaml - have you seen this before?

@kimwnasptd
Copy link
Member

@axsaucedo we recently changed slightly our files for installing KinD #2331.

I'd suggest to do a rebase to the latest code and use the tests/gh-actions/kind-cluster.yaml file for setting up the KinD cluster.

Signed-off-by: Alejandro Saucedo <[email protected]>
Signed-off-by: Alejandro Saucedo <[email protected]>
Signed-off-by: Alejandro Saucedo <[email protected]>
Signed-off-by: Alejandro Saucedo <[email protected]>
Signed-off-by: Alejandro Saucedo <[email protected]>
Signed-off-by: Alejandro Saucedo <[email protected]>
Signed-off-by: Alejandro Saucedo <[email protected]>
Signed-off-by: Alejandro Saucedo <[email protected]>
Signed-off-by: Alejandro Saucedo <[email protected]>
Signed-off-by: Alejandro Saucedo <[email protected]>
Signed-off-by: Alejandro Saucedo <[email protected]>
Signed-off-by: Alejandro Saucedo <[email protected]>
Signed-off-by: Alejandro Saucedo <[email protected]>
@kimwnasptd
Copy link
Member

I think the problem is in this line in the logs https://github.com/kubeflow/manifests/actions/runs/3549119873/jobs/5961121555

{"level":"error","ts":1669390311.322579,"logger":"controller.seldon-controller-manager","msg":"Reconciler error","reconciler group":"machinelearning.seldon.io","reconciler kind":"SeldonDeployment","name":"echo","namespace":"seldon","error":"no matches for kind \"HorizontalPodAutoscaler\" in version \"autoscaling/v2beta1\"","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227"}

I see that in K8s 1.22 The HorizontalPodAutoscaler had also the version v2beta1, but in K8s 1.25 they only have v2beta2
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#horizontalpodautoscaler-v1-autoscaling
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.25/#horizontalpodautoscaler-v1-autoscaling

@axsaucedo
Copy link
Contributor Author

@kimwnasptd it seems that we may need to update some of the KIND tests as currently Seldon supports Kubernetes clusters up to 1.24 - currently we are in the process of upgrading the components, particularly HPA as per SeldonIO/seldon-core#4172, however have been careful as it would introduce breaking changes. Is there a way to set up the KIND cluster with kubernetes 1.24? I could add an extra config file to provide a pre-1-25-k8s kind test. Thank you - let me know if you need more details.

@axsaucedo
Copy link
Contributor Author

@kimwnasptd thanks again for having a look, let me know your thoughts on the suggestion above, it does seem like the only way from our current state is to run the integration tests on a pre-1.25 k8s cluster, I can have a look at adding a pre-1-24-... kind setup config file if you think this can work (and add a note that emphasises the work + timeframe for post-1.25 support.

@kimwnasptd
Copy link
Member

@axsaucedo we discussed it with the release team as well in today's meeting (link to notes).

In the KF 1.7 release we are targeting for K8s 1.25, but since we expect some vendors to not yet support this version we are also supporting K8s 1.24.

I'm OK with having this component working with K8s 1.24, but let's have a clear text somewhere at the top of the README that exposes the K8s version. So that it will be clear to users that will use KF 1.7 that this component will not work with 1.25 (to avoid confusion, since we will be promoting that the target K8s version for KF 1.7 will be K8s 1.25)

cc @annajung @DomFleischmann @jbottum

@kimwnasptd
Copy link
Member

So @axsaucedo to merge this PR could you:

  1. Create a tests/gh-actions/kind-cluster-1-24.yaml configuration file for KinD and K8s 1.24
  2. Update the GH Action for Seldon to use that yaml instead
  3. Update the README to note that this works for K8s until 1.24

@axsaucedo
Copy link
Contributor Author

Ok perfect @kimwnasptd - very much appreciated. I will look at actioning these three points. Also absolutely agree on making a clear point in the README, as we also have a clear action plan towards ensuring seldon as a whole supports k8s 1.25+, we already have that PR suggested so it's a matter on how we can make sure we provide users with migration paths before we add it.

In that case we'll add these changes as suggested and re-trigger the tests - I'll let you know if there are any issues / further questions. Thanks.

@axsaucedo
Copy link
Contributor Author

@kimwnasptd I've now updated the integration tests to work with these constraints, and added a note on "requirements" on the readme to reflect the dependency of k8s pre-1.25, as well as a link to the current efforts to upgrade. Let me know if there's anything else needed on the PR 👍

README.md Outdated Show resolved Hide resolved
Comment on lines +2 to +5
on:
pull_request:
paths:
- contrib/seldon/**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be okay for now, but @kimwnasptd we should think about other ways to trigger /contrib tests in the future and not just trigger based on changes to the contrib/seldon directory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was thinking the same thing! We could probably use the workflow_dispatch event as well in these workflows, to be able to manually trigger them https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#workflow_dispatch

Let's play around with this more once we have the components update their manifests and have some basic tests in place.

contrib/seldon/README.md Outdated Show resolved Hide resolved
contrib/seldon/README.md Show resolved Hide resolved
axsaucedo and others added 2 commits November 29, 2022 20:26
@annajung
Copy link
Member

Thanks for working with us to update Seldon to meet the contrib guidelines
/lgtm

@axsaucedo
Copy link
Contributor Author

Fantastic @annajung - thanks for the update, let me know if anything else needed fromy my side to merge 👍

@kimwnasptd
Copy link
Member

Was about to comment regarding the README.md, but @annajung had covered it already #2326 (comment), nice!

@axsaucedo thank you for your time and great work on updating these! The tests will also help us ensuring the quality of the manifests we bring along and even raise the standard.

/lgtm
/approve

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: axsaucedo, kimwnasptd

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit f40fefa into kubeflow:master Dec 1, 2022
Comment on lines 40 to +41
# Profiles + KFAM
- ../apps/profiles/upstream/overlays/kubeflow
# - ../apps/profiles/upstream/overlays/kubeflow
Copy link

@umka1332 umka1332 Jan 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to remove Profile controller from Kubeflow example installation?

@juliusvonkohout
Copy link
Member

@axsaucedo why did you delete kubeflow-edit-seldon.yaml?

kevin85421 pushed a commit to juliusvonkohout/manifests that referenced this pull request Feb 28, 2023
* Updated and tested Seldon Core 1.15.0

Signed-off-by: Alejandro Saucedo <[email protected]>

* Added testing section

Signed-off-by: Alejandro Saucedo <[email protected]>

* Added upgrading reference

Signed-off-by: Alejandro Saucedo <[email protected]>

* Added automated test

Signed-off-by: Alejandro Saucedo <[email protected]>

* Added automated tests and documentation

Signed-off-by: Alejandro Saucedo <[email protected]>

* Added upgrade documentation

* Added description of product

Signed-off-by: Alejandro Saucedo <[email protected]>

* Updated contrib to ensure group maps correctly

Signed-off-by: Alejandro Saucedo <[email protected]>

* Added updated instructions for non-linux

Signed-off-by: Alejandro Saucedo <[email protected]>

* Added gh action for seldon test

Signed-off-by: Alejandro Saucedo <[email protected]>

* Added gh action for seldon test

Signed-off-by: Alejandro Saucedo <[email protected]>

* Added description of product

Signed-off-by: Alejandro Saucedo <[email protected]>

* Updating path for kind

Signed-off-by: Alejandro Saucedo <[email protected]>

* Updating path for kind

Signed-off-by: Alejandro Saucedo <[email protected]>

* Updating path for kind

Signed-off-by: Alejandro Saucedo <[email protected]>

* Added kind cluster and rebase

Signed-off-by: Alejandro Saucedo <[email protected]>

* Updated back to running kind script

Signed-off-by: Alejandro Saucedo <[email protected]>

* Fixed typo

Signed-off-by: Alejandro Saucedo <[email protected]>

* Updated to include latest manifests

Signed-off-by: Alejandro Saucedo <[email protected]>

* Create namespace if not exists

Signed-off-by: Alejandro Saucedo <[email protected]>

* Adding certmanager

Signed-off-by: Alejandro Saucedo <[email protected]>

* Added namespaces

Signed-off-by: Alejandro Saucedo <[email protected]>

* Added waiting for resources

Signed-off-by: Alejandro Saucedo <[email protected]>

* Updating to path

Signed-off-by: Alejandro Saucedo <[email protected]>

* Added wait for condition

Signed-off-by: Alejandro Saucedo <[email protected]>

* Added wait for condition

Signed-off-by: Alejandro Saucedo <[email protected]>

* Added wait for condition

Signed-off-by: Alejandro Saucedo <[email protected]>

* Added wait for condition

Signed-off-by: Alejandro Saucedo <[email protected]>

* Added wait for condition

Signed-off-by: Alejandro Saucedo <[email protected]>

* Added 1.24 kind tests plus note in the readme

Signed-off-by: Alejandro Saucedo <[email protected]>

* Extending timeout of wait for

Signed-off-by: Alejandro Saucedo <[email protected]>

* Update contrib/seldon/README.md

Co-authored-by: Anna <[email protected]>

* Reverted update on readme

Signed-off-by: Alejandro Saucedo <[email protected]>

Signed-off-by: Alejandro Saucedo <[email protected]>
Co-authored-by: Anna <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants