Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(py): Add support for storing models in S3 - [DRAFT] #765

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

syntaxsdev
Copy link

@syntaxsdev syntaxsdev commented Feb 5, 2025

Within the Python client, users will be able to directly store models to an S3 compatible object storage.
[DRAFT]

Description

The bulk of the changes were done in clients/python/src/_client.py

How Has This Been Tested?

Merge criteria:

  • All the commits have been signed-off (To pass the DCO check)
  • The commits have meaningful messages; the author will squash them after approval or in case of manual merges will ask to merge with squash.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work.
  • Code changes follow the kubeflow contribution guidelines.

If you have UI changes

  • The developer has added tests or explained why testing cannot be added.
  • Included any necessary screenshots or gifs if it was a UI change.
  • Verify that UI/UX changes conform the UX guidelines for Kubeflow.

Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign ckadner for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Member

@tarilabs tarilabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you @syntaxsdev for this!

some initial comments below.
which type of test can we consider to make sure the functionality is covered?

I'm thinking we could have some dedicated e2e test by extending the current opt-in pytest mechanism and deploy minio in that "scenario" of e2e testing. Do you have some additional ideas?

clients/python/src/model_registry/_client.py Outdated Show resolved Hide resolved
secret_access_key=secret_access_key,
)
try:
s3.upload_file(file, bucket_name, name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above comment amount "file" vs "path" (don't recall if there is any native boto3 api to do for the folder)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not understanding what you meant by this, but I have renamed the parameter to path.
If you give it a relative location it will resolve the file

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was in relation if it uploads recursively or need an explicit orchestration

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not. See here: #765 (comment)


def save_to_s3(
self,
file: str,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume if this is a path, it uploads recursively the path contents.
Can we confirm this, and describe it also in the pydoc?

Copy link
Author

@syntaxsdev syntaxsdev Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it only uploads a singular file due to how upload_file works. Are you suggesting writing a wrapper to achieve recursive path uploads?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see in some of the tutorials how we show usage of S3 for multiple files in a bucket, wdyt?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what exactly are you referring to?
if you are referring to this then that's not exactly what I was talking about.

Afaik, boto3 S3 does not have a multiple upload definition or allow recursive. uploads, we'd have to build that.

that's not a problem - the issue is do we want to add that built into this method and if so, see #765 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what exactly are you referring to?

tutorials (of ODH, but also other projects) which show how to persist multiple files in the identified bucket; sorry if I was not clear

os.remove(model_file.name)


@pytest.fixture
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no scope added here because MonkeyPatch needs to use a function scope and default is function so its omitted

@@ -623,3 +629,44 @@ def test_hf_import_default_env(client: ModelRegistry):

for k in env_values:
os.environ.pop(k)


@pytest.mark.dd
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ignore for now, will change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants