Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

components: move params.env image updating to Init stage #1191

Open
wants to merge 3 commits into
base: incubation
Choose a base branch
from

Conversation

ykaliuta
Copy link
Contributor

@ykaliuta ykaliuta commented Aug 19, 2024

Jira: https://issues.redhat.com/browse/RHOAIENG-11592

Image names in environment are not supposed to be changed during
runtime of the operator, so it makes sense to update them only on
startup.

If manifests are overriden by DevFlags, the DevFlags' version will
be used.

The change is straight forward for most of the components where only
images are updated and params.env is located in the kustomize root
directory, but some components (dashboard, ray, codeflare,
modelregistry) also update some extra parameters. For them image
part only is moved to Init since other updates require runtime DSCI
information.

The patch also changes logic for ray, codeflare, and modelregistry
in this regard to update non-image parameters regardless of DevFlags
like it was changed in dashboard recently.

The DevFlags functionality brings some concerns:

  • For most components the code is written such a way that as soon as
    DevFlags supplied the global path variables are changed and never
    reverted back to the defaults. For some (dashboard, trustyai) there
    is (still global) OverridePath/entryPath pair and manifests reverted
    to the default, BUT there is no logic of transition.

  • codeflare: when manifests are overridden namespace param is
    updated in the hardcoded (stock) path;

This logic is preserved.

Description

How Has This Been Tested?

Screenshot or short clip

Merge criteria

  • You have read the contributors guide.
  • Commit messages are meaningful - have a clear and concise summary and detailed explanation of what was changed and why.
  • Pull Request contains a description of the solution, a link to the JIRA issue, and to any dependent or related Pull Request.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Copy link

openshift-ci bot commented Aug 19, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from ykaliuta. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@zdtsw
Copy link
Member

zdtsw commented Aug 27, 2024

i am fine with this change, but would like to confirm one requirement first.

@ykaliuta
Copy link
Contributor Author

@bartoszmajsak may be you want to be aware of the proposal

@@ -311,6 +328,10 @@ func main() { //nolint:funlen,maintidx
setupLog.Error(err, "unable to set up ready check")
os.Exit(1)
}
if err := initComponents(platform); err != nil {
setupLog.Error(err, "unable to init components")
os.Exit(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should here be os.Exit(1)?
say any of the downstream component fails to set image, will cause operator crash?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. One one hand it makes problems obvious on startup. On it would make some of the system working even if dashboard tries to use wrong repo.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what i am afarid is
if compoent X fail to update, but user does not even want to enable component X but only Y and Z, should that make pod crash ?
in thoery , it should not even fail,
and in another theory if it was a glitch: the restart of pod , due to crash, can fix X in the next run.

Copy link
Contributor Author

@ykaliuta ykaliuta Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If fatal init is not needed, implicit init() will work. UPD: missed platform parameter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For non-fatal case I would like at least to log the error. Would it be ok to pass Context and use Bartosz' idea to wrap logger in it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that sounds better, we will need to show the error, only part is i dont know if we should make it fatal.
@lburgazzoli / @VaishnaviHire how you feel about this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it non-fatal, but on the per-Init level. Leaving possibility :)

@@ -39,6 +39,10 @@ type Component struct {
DevFlags *DevFlags `json:"devFlags,omitempty"`
}

func (c *Component) Init(_ cluster.Platform) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe rename to InitImage() if it is only for image not other extra params.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did hope it can be useful for other initializations as well. (but actually could no find any independent of reconcile loop)

"codeflare-operator-controller-image": "RELATED_IMAGE_ODH_CODEFLARE_OPERATOR_IMAGE", // no need mcad, embedded in cfo
}

if err := deploy.ApplyParams(ParamsPath, imageParamMap); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also update docs on https://github.com/opendatahub-io/opendatahub-operator/blob/incubation/pkg/deploy/envParams.go#L67
just to be clear that it only update if it does not match otherwise it wont do anything

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean 450e19f ("deploy: ApplyParams: do not write params.env if there are no updates (#1186)") functionality?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, by reading the comments, i do not feel it reflects what the fuction is doing with recent chagnes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's anyway unrelated to this patchset, can come as fixup for the aforementioned patch.

if err := deploy.ApplyParams(ParamsPath, imageParamMap, map[string]string{"namespace": dscispec.ApplicationsNamespace}); err != nil {
return fmt.Errorf("failed update image from %s : %w", CodeflarePath+"/bases", err)
}
if err := deploy.ApplyParams(ParamsPath, map[string]string{}, map[string]string{"namespace": dscispec.ApplicationsNamespace}); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if err := deploy.ApplyParams(ParamsPath, map[string]string{}, map[string]string{"namespace": dscispec.ApplicationsNamespace}); err != nil {
if err := deploy.ApplyParams(ParamsPath, nil, map[string]string{"namespace": dscispec.ApplicationsNamespace}); err != nil {

do not even need an empty map ? same for the ray one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, leftover. (It's a bit counter intuitive that access nil map does not crash)

@zdtsw zdtsw requested review from lburgazzoli and removed request for ajaypratap003 and asanzgom September 11, 2024 10:26
@zdtsw
Copy link
Member

zdtsw commented Sep 11, 2024

FYI @csams

@ykaliuta ykaliuta force-pushed the init-components branch 2 times, most recently from b3d279f to 4898857 Compare September 12, 2024 11:54
@ykaliuta
Copy link
Contributor Author

/retest-required

@ykaliuta
Copy link
Contributor Author

/retest required

Copy link

openshift-ci bot commented Sep 12, 2024

@ykaliuta: The /retest command does not accept any targets.
The following commands are available to trigger required jobs:

  • /test ci-index
  • /test images
  • /test opendatahub-operator-e2e
  • /test opendatahub-operator-pr-image-mirror

Use /test all to run all jobs.

In response to this:

/retest required

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ykaliuta
Copy link
Contributor Author

/retest-required

3 similar comments
@ykaliuta
Copy link
Contributor Author

/retest-required

@ykaliuta
Copy link
Contributor Author

/retest-required

@ykaliuta
Copy link
Contributor Author

/retest-required

@ykaliuta
Copy link
Contributor Author

/test opendatahub-operator-e2e

2 similar comments
@ykaliuta
Copy link
Contributor Author

/test opendatahub-operator-e2e

@ykaliuta
Copy link
Contributor Author

/test opendatahub-operator-e2e

@ykaliuta
Copy link
Contributor Author

It passes locally if I disable training operator which fails to fetch image from docker.io.

@ykaliuta
Copy link
Contributor Author

/test opendatahub-operator-e2e

2 similar comments
@ykaliuta
Copy link
Contributor Author

/test opendatahub-operator-e2e

@zdtsw
Copy link
Member

zdtsw commented Sep 17, 2024

/test opendatahub-operator-e2e

@ykaliuta
Copy link
Contributor Author

/test opendatahub-operator-e2e

Add Init() method to the Component interface and call it from main on
startup.

Will be used to move startup-time code from ReconcileComponent
(like adjusting params.env).

Signed-off-by: Yauheni Kaliuta <[email protected]>
@ykaliuta
Copy link
Contributor Author

/retest-required

Jira: https://issues.redhat.com/browse/RHOAIENG-11592

Image names in environment are not supposed to be changed during
runtime of the operator, so it makes sense to update them only on
startup.

If manifests are overriden by DevFlags, the DevFlags' version will
be used.

The change is straight forward for most of the components where only
images are updated and params.env is located in the kustomize root
directory, but some components (dashboard, ray, codeflare,
modelregistry) also update some extra parameters. For them image
part only is moved to Init since other updates require runtime DSCI
information.

The patch also changes logic for ray, codeflare, and modelregistry
in this regard to update non-image parameters regardless of DevFlags
like it was changed in dashboard recently.

The DevFlags functionality brings some concerns:

- For most components the code is written such a way that as soon as
DevFlags supplied the global path variables are changed and never
reverted back to the defaults. For some (dashboard, trustyai) there
is (still global) OverridePath/entryPath pair and manifests reverted
to the default, BUT there is no logic of transition.

- codeflare: when manifests are overridden namespace param is
updated in the hardcoded (stock) path;

This logic is preserved.

Signed-off-by: Yauheni Kaliuta <[email protected]>
@ykaliuta
Copy link
Contributor Author

read logs more carefully and found my mistake

Comment on lines +42 to +44
func (c *Component) Init(_ context.Context, _ cluster.Platform) error {
return nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why need this? you foresee some component might not have implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frankly put just in case. Can remove.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought was left from the first commit, when only one component implemented with Init(..) at that time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

2 participants