Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

serialize artifacts with pickle protocol 4 if possible #2243

Merged
merged 4 commits into from
Feb 7, 2025

Conversation

amerberg
Copy link
Contributor

@amerberg amerberg commented Feb 3, 2025

Update the artifact serialization to to save with protocol 4 if available. Currently, artifacts are serialized with protocol 2, unless protocol 2 fails due to the size of the artifact, in which case protocol 4 is used. This PR updates the TaskDatastore to use protocol 4 if available, and otherwise attempt protocol 2.

Protocol 4 was introduced in Python 3.4, which was released more than 10 years ago, and Metaflow's setup.py script identifies Python 3.6 as the earliest supported version, so this change should not cause problems for users today. This leaves artifact loading unchanged.

Using protocol 4 by default has a number of advantages:

  1. Smaller size of artifacts < 2GB
  2. Reducing the time to serialize large (>2GB) artifacts, by eliminating the doomed protocol 2 attempt

I have successfully tested these changes on a simple flow with a large artifact using Python 3.13 on Argo Workflows.

@amerberg amerberg changed the title serialize artifacts with pickle protocol if possible serialize artifacts with pickle protocol 4 if possible Feb 3, 2025
@amerberg amerberg marked this pull request as ready for review February 3, 2025 19:49
@romain-intel romain-intel requested a review from saikonen February 6, 2025 08:49
romain-intel
romain-intel previously approved these changes Feb 6, 2025
Copy link
Contributor

@romain-intel romain-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'll let @saikonen take a look as well but I think we can merge this.

metaflow/datastore/task_datastore.py Outdated Show resolved Hide resolved
metaflow/datastore/task_datastore.py Outdated Show resolved Hide resolved
@nflx-mf-bot
Copy link
Collaborator

Testing[4148896] @ b41f37c

@nflx-mf-bot
Copy link
Collaborator

Testing[970] @ b41f37c had 2 FAILUREs.

@amerberg
Copy link
Contributor Author

amerberg commented Feb 6, 2025

Testing[970] @ b41f37c had 2 FAILUREs.

Is this different from the tests in the "checks" section, which all appear to have passed? If so, is there a way to see the failures?

@romain-intel
Copy link
Contributor

The tests are internal tests we run (since we also use the OSS version). The 2 failures are flaky tests. This is good to go for me.

Copy link
Collaborator

@saikonen saikonen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be good to go from what I can tell.

@romain-intel romain-intel merged commit f3fbf3d into Netflix:master Feb 7, 2025
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants