Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(csi): make csi use mr version from go.mod in ci #598

Conversation

Al-Pragliola
Copy link
Contributor

@Al-Pragliola Al-Pragliola commented Nov 27, 2024

Description

When building/testing CSI in the CI we are using "local" mr as a dependency instead of the version pinned in the go.mod file, this bring an inconsistency between what we are shipping as a docker image and what is actually in the codebase.

Side effect: This should fix all the problems that dependabot gives us when touching mr go.mod.

How Has This Been Tested?

no local testing is needed

Merge criteria:

  • All the commits have been signed-off (To pass the DCO check)
  • The commits have meaningful messages; the author will squash them after approval or in case of manual merges will ask to merge with squash.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work.
  • Code changes follow the kubeflow contribution guidelines.

Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign ckadner for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot added size/S and removed size/XS labels Nov 27, 2024
@Al-Pragliola Al-Pragliola marked this pull request as ready for review November 27, 2024 18:15
@google-oss-prow google-oss-prow bot requested review from ckadner and Tomcli November 27, 2024 18:15
@Al-Pragliola
Copy link
Contributor Author

/cc @tarilabs @lampajr

Copy link
Member

@tarilabs tarilabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this re-uses (*in github action) the standard docker-build target from the main makefile, and

/lgtm

thanks a lot @Al-Pragliola

Copy link
Member

@lampajr lampajr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Al-Pragliola,

Overall I don't think this is going to work as expected for two reasons:

  • If we change .github/workflows/csi-build-and-push-image.yml we might be in a chicken-egg situation as we would like to release CSI together with model-registry but we don't have yet the tag released
  • If we change .github/workflows/csi-test.yml we might be not able to test csi with a dev version of model-registry, i.e., we might be not able to detect any breaking change in model-registry that would break csi

wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do this, we are making use of the model-registry version that has pinned in the go.mod as you said, but this makes impossible to test the current CSI with the dev (not yet released) version of model-registry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is not the solution to the bigger problem of the relationship between CSI and MR and it's not aimed to be, for that we should consider what you guys proposed in the related issue #338

But to test CSI with the current unreleased version of MR we could make use of:

go get github.com/kubeflow/model-registry@<commit-hash>

so that when we are releasing a new version of both we take the latest commit hash and update the go.mod (if it's needed at all, because if there's no change to the rest client it should be compatible)

wdyt?

@Al-Pragliola
Copy link
Contributor Author

I understand this re-uses (*in github action) the standard docker-build target from the main makefile, and

/lgtm

thanks a lot @Al-Pragliola

yep, with docker-build-dev we had

# Copy the Go Modules manifests
COPY csi/go.mod go.mod
COPY csi/go.sum go.sum
RUN echo "replace github.com/kubeflow/model-registry => ../" >> go.mod

overriding the version of model-registry from the csi/go.mod file

@lampajr
Copy link
Member

lampajr commented Nov 27, 2024

Expanding on my previous comment, with docker-build-dev I had to do

# Copy the Go Modules manifests
COPY csi/go.mod go.mod
COPY csi/go.sum go.sum
RUN echo "replace github.com/kubeflow/model-registry => ../" >> go.mod

exactly because we would like to test the csi with the local/dev model-registry in order to be able to detect any breaking change earlier.

Consider the scenario where you change something, by mistake or on purpose, you would not be able to detect that if you use a pinned version of model registry.

On the other hand, we could try to change .github/workflows/csi-build-and-push-image.yml to make use of a pinned version of the model-registry but either:

  • We change/set that version during the workflow (as it is triggered on tag push) but how do we manage the main branch push?
  • We decide the keep csi and model registry NOT in sync and keep using a pinned version, in this case we should not release csi on tag push

@Al-Pragliola
Copy link
Contributor Author

Al-Pragliola commented Nov 27, 2024

Thanks for you feedback @lampajr , what do you think about having another GHA that procs only when changes are made in the pkg folder and checks compatibility with CSI (and in the future the inferenceservice controller) using in this case the docker-build-dev target?

thinking out loud, we are only changing dependencies in the build phase, we still test against the current MR deployment when running kind so we should still be able to detect breaking changes

@Al-Pragliola
Copy link
Contributor Author

Al-Pragliola commented Nov 27, 2024

Expanding on my previous comment, with docker-build-dev I had to do

# Copy the Go Modules manifests
COPY csi/go.mod go.mod
COPY csi/go.sum go.sum
RUN echo "replace github.com/kubeflow/model-registry => ../" >> go.mod

exactly because we would like to test the csi with the local/dev model-registry in order to be able to detect any breaking change earlier.

Consider the scenario where you change something, by mistake or on purpose, you would not be able to detect that if you use a pinned version of model registry.

On the other hand, we could try to change .github/workflows/csi-build-and-push-image.yml to make use of a pinned version of the model-registry but either:

* We change/set that version during the workflow (as it is triggered on tag push) but how do we manage the `main` branch push?

* We decide the keep csi and model registry **NOT** in sync and keep using a pinned version, in this case we should not release csi on tag push

Let's analyze this scenario:

  • breaking changes from MR and pinned version in go.mod

(from this PR) CI will build using the pinned version from go.mod so the build will succeed - then the e2e tests will fail because we are testing against the local built version of MR (and there's a breaking change)

pro:

  • we avoid false negatives from go.mod bumps in MR
  • docker image and the codebase are in sync

cons:

  • we rely on the robustness of our e2e tests to be sure that everything works
  • we may have to bump using commit hashes for dev releases of MR

(previous) CI will fail to build and stop here

pro:

  • ci will fail earlier
  • no need to bump the go.mod file on release

cons:

  • what we push to the image hub does not reflect what is in the codebase
  • there will be false negatives if we change the go.mod in MR

neither is a perfect solution, as mentioned above, we can improve the "fix" from this PR by adding a GHA that actually checks if something breaks when there's a change in the mr pkg folder

@lampajr
Copy link
Member

lampajr commented Nov 28, 2024

Hi @Al-Pragliola,

I agree there is no "best" solution here and I see drawbacks on both approaches tbh.

Let me write down some comments

(from this PR) CI will build using the pinned version from go.mod so the build will succeed - then the e2e tests will fail because we are testing against the local built version of MR (and there's a breaking change)
pro:
we avoid false negatives from go.mod bumps in MR

I am not sure this is actually guaranteed, are you saying that because you had some evidence confirming this?

docker image and the codebase are in sync

I agree this would be clearly better, or alternatively we could clearly specify the CSI uses the locally referenced model-registry without ever pinning a version there

cons:
we rely on the robustness of our e2e tests to be sure that everything works
we may have to bump using commit hashes for dev releases of MR

Another cons I see is that model-registry and CSI won't be in sync anymore and a tagged version of model registry will contain a CSI that references an older version of model-registry (as the tag was not there when you are tagging MR) and this could create even further confusion imho.

Overall, I think that having the possibility to check incompatibility issues earlier is appreciated, so having a way to check the CSI with the local/dev version of model registry is really good.

TBH I think the ~best solution would be to find what is the best approach among those mentioned here #338

@lampajr
Copy link
Member

lampajr commented Nov 28, 2024

(previous) CI will fail to build and stop here
pro:
ci will fail earlier
no need to bump the go.mod file on release

cons:
what we push to the image hub does not reflect what is in the codebase

I think we can remove this cons by clearly setting that CSI uses the locally referenced model-registry codebase, without pinning any version, I think this is actully possible (i.e., keeping "replace github.com/kubeflow/model-registry => ../" by default)

there will be false negatives if we change the go.mod in MR

I agree this is a problem, but I am not sure this is going to be fixed with this PR, happy to be wrong though

@Al-Pragliola
Copy link
Contributor Author

Al-Pragliola commented Nov 28, 2024

I think we can remove this cons by clearly setting that CSI uses the locally referenced model-registry codebase, without pinning any version, I think this is actully possible (i.e., keeping "replace github.com/kubeflow/model-registry => ../" by default)

there will be false negatives if we change the go.mod in MR

I agree this is a problem, but I am not sure this is going to be fixed with this PR, happy to be wrong though

You are right, I did not provide any evidence 🙏🏼 I opened a PR for a recent example of a false negative, here you can see the previous behavior:

#548
https://github.com/kubeflow/model-registry/actions/runs/12032677420/job/33545058641?pr=548

and post-"fix":

#600

I agree with you that we should implement a solution from #338 , I am a fan of the sub command approach 👍🏼

@Al-Pragliola Al-Pragliola mentioned this pull request Nov 28, 2024
4 tasks
@lampajr
Copy link
Member

lampajr commented Nov 29, 2024

I think we can remove this cons by clearly setting that CSI uses the locally referenced model-registry codebase, without pinning any version, I think this is actully possible (i.e., keeping "replace github.com/kubeflow/model-registry => ../" by default)

there will be false negatives if we change the go.mod in MR

I agree this is a problem, but I am not sure this is going to be fixed with this PR, happy to be wrong though

You are right, I did not provide any evidence 🙏🏼 I opened a PR for a recent example of a false negative, here you can see the previous behavior:

#548 https://github.com/kubeflow/model-registry/actions/runs/12032677420/job/33545058641?pr=548

and post-"fix":

#600

Thanks for the examples 🙏🏻

I agree with you that we should implement a solution from #338 , I am a fan of the sub command approach 👍🏼

I am planning to propose something sooner, then I think we can discuss whether that approach is cleaner or not and if we want to proceed that way, wdyt?

@Al-Pragliola
Copy link
Contributor Author

I am planning to propose something sooner, then I think we can discuss whether that approach is cleaner or not and if we want to proceed that way, wdyt?

Sure, that sounds good to me. Let's discuss it further once you've proposed the approach

@Al-Pragliola
Copy link
Contributor Author

Closed in favor of #601

@Al-Pragliola Al-Pragliola deleted the al-pragliola-fix-make-csi-use-pinned-mr-version branch December 5, 2024 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants