-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RHOAI 2.14 Upstream Sync #355
RHOAI 2.14 Upstream Sync #355
Conversation
- regenerate code and manifests - checkin new external crd Signed-off-by: Wen Zhou <[email protected]> (cherry picked from commit 1f34806)
* deps: uplift crd-ref-docs to 0.1.0 - their release changed and support different os/arch binary Signed-off-by: Wen Zhou <[email protected]> * update: rewrite after review Signed-off-by: Wen Zhou <[email protected]> --------- Signed-off-by: Wen Zhou <[email protected]> (cherry picked from commit 3ce4d58)
Signed-off-by: Wen Zhou <[email protected]> (cherry picked from commit 742b09b)
…o#1205) * fix: add check for all platform offering on components - wait for component's deployment is ready before return to caller - reduce logging - prolong timeout for workbench, trustyai, dsp, kserve Signed-off-by: Wen Zhou <[email protected]> * fix: wrong log Signed-off-by: Wen Zhou <[email protected]> * test: increase timeout and move a bit of order Signed-off-by: Wen Zhou <[email protected]> --------- Signed-off-by: Wen Zhou <[email protected]> Co-authored-by: ajaypratap003 <[email protected]> (cherry picked from commit ab7e04e)
* test: update e2e testcases - rename files - add check on FTer CRD - use label from labels.ODHAppPrefix - update misleading log message - re-order test cases - rename DSC CR from e2e-test to e2e-test-dsc Signed-off-by: Wen Zhou <[email protected]> * fix: missing "/" Signed-off-by: Wen Zhou <[email protected]> --------- Signed-off-by: Wen Zhou <[email protected]> (cherry picked from commit a611427)
* tests: e2e: do not print Trying again if cannot list deployments The message is misleading, actually it returns error so will not retry. Signed-off-by: Yauheni Kaliuta <[email protected]> * tests: e2e: testApplicationCreation: enable immediate for PollUntilContextTimeout After recent update the check was moved after DSC Ready check. There is no point to wait extra Retry Interval for deployment check. It decreases execution time. Signed-off-by: Yauheni Kaliuta <[email protected]> * tests: e2e: remove check for dependent operator The intention for the check originally was to return success if no deployments for component found due to missing operator (probably with a bug in actual check). Anyway, the case is not valid anymore, dependent operators are installed. Signed-off-by: Yauheni Kaliuta <[email protected]> * tests: e2e: testApplicationCreation: early return from the loop Simplify deployments readiness check loop with early return when problematic deployment found and successful return as the default. The len() == 0 check is needed otherwise it will declare success for no deployements. It will be amended with enabled/disabled check in a followup. Signed-off-by: Yauheni Kaliuta <[email protected]> --------- Signed-off-by: Yauheni Kaliuta <[email protected]> (cherry picked from commit fc005e3)
…l and network impact (opendatahub-io#1190) * feat: improve Operator performance by using caching to reduce api call and network impact (work based on opendatahub-io#1189) - secret: default application namespace + other default ones + istio cert - configmap: all - namespace: all - ingressctrler: "default" one - deployment: default application namespaces + default namespaces Signed-off-by: Wen Zhou <[email protected]> * revert: back out changes for webhook Signed-off-by: Wen Zhou <[email protected]> * update: move namespace for cache into functions and only add the ones for platform needs Signed-off-by: Wen Zhou <[email protected]> * update: rename functions Signed-off-by: Wen Zhou <[email protected]> --------- Signed-off-by: Wen Zhou <[email protected]> (cherry picked from commit 5759f5e)
…atahub-io#1213) * tests: e2e: do not wait deployements for not Managed components The test retries if there are no deployements even if component is disabled when this is a normal situation. Make it to retry for disabled (explicitly Removed) component only if there are deployments, but if it is not Managed instantly return success. It also means that the "disabled" check is inverted, it is not error anymore. It descreases execution time. Signed-off-by: Yauheni Kaliuta <[email protected]> * tests: e2e: increase resourceCreationTimeout After some recent changes looks like it takes longer for DataScienceCluster CR gets Ready status. Signed-off-by: Yauheni Kaliuta <[email protected]> --------- Signed-off-by: Yauheni Kaliuta <[email protected]> (cherry picked from commit d57bb12)
…#1218) - trustyai has introduced overlays for odh and rhoai - we can switch to use these than base Signed-off-by: Wen Zhou <[email protected]> (cherry picked from commit a990dea)
- move resourceCreationgTimeout and resourceRetryInterval out of tc - rename resourceRetryInterval to generalRetryInterval - create different timeout instead of using resourceCreationgTimeout for all cases - move check on DSC ready out of testDSCCreation into validateDSCReady which calls waitDSCReady - remove 1 min sleep in testAllApplicationCreation - rename testAllApplicationCreation to testAllComponentCreation - rename testApplicationCreation to testComponentCreation - do not immediate check func in testComponentCreation - rename testApplicationDeletion to testComponentDeletion - immediate check func in testComponentDeletion - rename waitForControllerDeployment to waitForOperatorDeployment - set new timeout, see const block - make check on CRD run in parallel Signed-off-by: Wen Zhou <[email protected]> (cherry picked from commit 199f08f)
… to lose the cache for istio-system ns (opendatahub-io#1223) Signed-off-by: Wen Zhou <[email protected]> (cherry picked from commit c49b501)
…opendatahub-io#1224) - if DSC CR has old format with empty devFlag:{}, it can cause downstream image not replace but run with quay.io one Signed-off-by: Wen Zhou <[email protected]> (cherry picked from commit 1dbe269)
(cherry picked from commit b886de9)
* feat: enable Trusty's Bias Feature Flag - for upgrade case, if previously flag is true, after install new version should set to false Signed-off-by: Wen Zhou <[email protected]> * - rename cluster.GetRelease to cluster.GetReleaseFromCSV() - introduce upgrade.GetReleaseFromCR() - create upgradeODCCR() for calling unsetOwnerReference and updateODCBiasMetrics - updateODSCBiasMetrics() only patch if previous Release is lower than 2.14, e.g 2.13.1, 2.8.3 Signed-off-by: Wen Zhou <[email protected]> --------- Signed-off-by: Wen Zhou <[email protected]> (cherry picked from commit 562efd1)
…pendatahub-io#1228) * feat(tracker): feature tracker can have ownerReferences Introduces ability to define owner reference for FeatureTracker (internal custom resource). This enables garbage collection of resources belonging to the given Feature when its owner such as DSC or DSCI has been removed. When Features are grouped through means of `FeatureHandler`s, the ownership is defined under the hood, linking it to DSCI or DSC, depending on the type used. In addition, this PR simplifies how Feature relies on `client.Client` instance. Instead of creating its own instance it expects to have one passed as part of the builder call chain. It creates a default one otherwise. With that we can rely on a client configured for controllers with the right schemas, caching rules etc. As a consequence the change of retry logic for status needed to adjusted. Besides a conflict it needs to be triggered on NotFound error, as shared client's cache might not have yet an instance of FeatureTracker created just before the condition is reported. The reason why it was working before is the fact that Feature has its own simple client instance w/o caching. > [!IMPORTANT] > > With DSCI and DSC becoming owners of particular FeatureTrackers > `func (c *Component) Cleanup` called in finalizer block becomes > somewhat speculative. The only use case where custom function is > invoked is unpatching SMCP authorization provider. This was based > on early assumption that we might want to plug into customer's > existing SMCP instance. It's unclear to me if this is still long-term > thinking so we might want to revisit need for this function. > > From the completness point of view, if we allow to create/manipulate > resources programatically as part of the Feature DSL, we should be able > to register cleanup counter-part functions, especially if we cannot simply > remove them (as this is handled through ownership of associated > FeatureTracker). > > There is, however, a need to perform cleanup when particular > component is "Removed" (not managed anymore). Right now this is > handled inside its Reconcile function. Fixes [RHOAIENG-8629](https://issues.redhat.com/browse/RHOAIENG-8629) Relates to opendatahub-io#1192 (comment) Co-authored-by: Bartosz Majsak <[email protected]> * fix: no need to return dsci --------- Co-authored-by: Cameron Garrison <[email protected]> (cherry picked from commit 10dd554)
…ub-io#1232) * upgrade: rename GetReleaseFromCR to GetDeployedRelease GetReleaseFromCR name is more about implementation than the task it performs. The function fetches release from DSC/DSCI mean the latest release the data science cluster was deployed by. Consider it as "deployed release". Signed-off-by: Yauheni Kaliuta <[email protected]> * cluster: rename GetReleaseFromCSV back to GetRelease GetReleaseFromCSV name is more about implementation than the task it performs. The function fetches actual running operator release and CSV is only one possibility for that. Signed-off-by: Yauheni Kaliuta <[email protected]> --------- Signed-off-by: Yauheni Kaliuta <[email protected]> (cherry picked from commit b4c880a)
…1222) - we do not support other values for now, this is to prevent user change to different value and cause old and new both running Signed-off-by: Wen Zhou <[email protected]> (cherry picked from commit ff7d131)
Passing name and namespace is redundant, as it is always the same as newSecret passed to this func as well. This change simplifies function signature and uses `client.ObjectKeyFromObject` utility function to fetch existing secret instance from the cluster. In addition error variables has been renamed to avoid shadowing. (cherry picked from commit fc789a9)
…1235) Ensures each tests uses different DSCI instance as a test fixture. Without this tests are failing (as they do in latest incubation commit), as opendatahub-io#1222 introduced validation check to ensure AppNamespace is immutable, but opendatahub-io#1228 is trying to update DSCI violating new constraint. With this change each test uses dedicated DSCI. (cherry picked from commit df1add0)
Including the client in the function signature makes its usage explicit and more idiomatic. This approach clarifies that the client is intended for use within the function. Decoupling the client from Feature struct enhances code readability. It's a follow-up to PR [opendatahub-io#1228](opendatahub-io#1228 (comment)). (cherry picked from commit fde5d69)
7fa4165
to
34bd8a5
Compare
thought https://issues.redhat.com/browse/RHOAIENG-349 is already on main from #337 |
@ykaliuta Any idea on why the webhook tests might be failing? |
It was linked in this PR, that I cherry-picked opendatahub-io#1205 . These changes do not seem to be included |
Hmm, interesting, the test should use repository's manifests where it is enabled. |
It actually failing on getting DSCI/DSC. Wondering if there is any timing issue since it fetched by Kind |
ok, requireInstalled() has some unexpected changes. |
tests/e2e/creation_test.go
Outdated
@@ -386,7 +383,7 @@ func (tc *testContext) testOwnerrefrences() error { | |||
// Test Dashboard component | |||
if tc.testDsc.Spec.Components.Dashboard.ManagementState == operatorv1.Managed { | |||
appDeployments, err := tc.kubeClient.AppsV1().Deployments(tc.applicationsNamespace).List(tc.ctx, metav1.ListOptions{ | |||
LabelSelector: labels.ODH.Component("rhods-dashboard"), | |||
LabelSelector: labels.ODH.Component(tc.testDsc.Spec.Components.Dashboard.GetComponentName()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to keep the old one
tests/e2e/creation_test.go
Outdated
@@ -442,7 +439,8 @@ func (tc *testContext) testDefaultCertsAvailable() error { | |||
func (tc *testContext) testUpdateComponentReconcile() error { | |||
// Test Updating Dashboard Replicas | |||
appDeployments, err := tc.kubeClient.AppsV1().Deployments(tc.applicationsNamespace).List(tc.ctx, metav1.ListOptions{ | |||
LabelSelector: labels.ODH.Component("rhods-dashboard"), | |||
|
|||
LabelSelector: labels.ODH.Component(tc.testDsc.Spec.Components.Dashboard.GetComponentName()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here need to keep the old one
tests/e2e/creation_test.go
Outdated
@@ -493,7 +491,8 @@ func (tc *testContext) testUpdateDSCComponentEnabled() error { | |||
|
|||
if tc.testDsc.Spec.Components.Dashboard.ManagementState == operatorv1.Managed { | |||
appDeployments, err := tc.kubeClient.AppsV1().Deployments(tc.applicationsNamespace).List(tc.ctx, metav1.ListOptions{ | |||
LabelSelector: labels.ODH.Component("rhods-dashboard"), | |||
|
|||
LabelSelector: labels.ODH.Component(tc.testDsc.Spec.Components.Dashboard.GetComponentName()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here too
docs/DESIGN.md
Outdated
@@ -62,6 +62,8 @@ To deploy ODH components seamlessly, ODH operator will watch two CRDs: | |||
name: knative-serving | |||
modelmeshserving: | |||
managementState: Managed | |||
modelregistry: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can skip this one in this PR
docs/upgrade-testing.md
Outdated
@@ -141,10 +141,10 @@ spec: | |||
kueue: | |||
managementState: Managed | |||
trainingoperator: | |||
managementState: Removed | |||
managementState: Managed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can skip change in this file
docs/upgrade-testing.md
Outdated
workbenches: | ||
managementState: Managed | ||
trustyai: | ||
managementState: Removed | ||
managementState: Managed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: VaishnaviHire, zdtsw The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
78d42fd
into
red-hat-data-services:main
We need to also update manifests in midestream |
i will fix this today update: done in PR 2750 |
Jira Issues:
Description
How Has This Been Tested?
Screenshot or short clip
Merge criteria