-
Notifications
You must be signed in to change notification settings - Fork 797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
serialize artifacts with pickle protocol 4 if possible #2243
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I'll let @saikonen take a look as well but I think we can merge this.
Testing[4148896] @ b41f37c |
Testing[970] @ b41f37c had 2 FAILUREs. |
Is this different from the tests in the "checks" section, which all appear to have passed? If so, is there a way to see the failures? |
The tests are internal tests we run (since we also use the OSS version). The 2 failures are flaky tests. This is good to go for me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be good to go from what I can tell.
Update the artifact serialization to to save with protocol 4 if available. Currently, artifacts are serialized with protocol 2, unless protocol 2 fails due to the size of the artifact, in which case protocol 4 is used. This PR updates the
TaskDatastore
to use protocol 4 if available, and otherwise attempt protocol 2.Protocol 4 was introduced in Python 3.4, which was released more than 10 years ago, and Metaflow's
setup.py
script identifies Python 3.6 as the earliest supported version, so this change should not cause problems for users today. This leaves artifact loading unchanged.Using protocol 4 by default has a number of advantages:
I have successfully tested these changes on a simple flow with a large artifact using Python 3.13 on Argo Workflows.