-
Notifications
You must be signed in to change notification settings - Fork 787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
POC modification for Differentiated Artifacts Framework #1996
base: master
Are you sure you want to change the base?
Conversation
Testing[813] @ 7bd1dcd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a few clarifying comments.
@@ -60,10 +60,13 @@ def save_blobs(self, blob_iter, raw=False, len_hint=0): | |||
|
|||
Parameters | |||
---------- | |||
blob_iter : Iterator over bytes objects to save | |||
blob_iter : Iterator[Union[bytes, Tuple[bytes, str]]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added documentation for the specific raw flag
raw : bool, optional | ||
Whether to save the bytes directly or process them, by default False | ||
len_hint : Hint of the number of blobs that will be produced by the | ||
len_hint : int, default 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clearly need to fix the comment (it seems borked before too)
@@ -76,23 +79,40 @@ def save_blobs(self, blob_iter, raw=False, len_hint=0): | |||
|
|||
def packing_iter(): | |||
for blob in blob_iter: | |||
raw_setting = raw |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic is to use raw setting but allow each blob to override. Compression methods would be fixed for now (not sure if there is a lot of value in allowing a huge amount of customization here) but we could add another one ourselves.
@@ -30,7 +27,12 @@ def __init__( | |||
# produce a set of SHA keys to prefetch based on artifact names | |||
prefetch = set() | |||
for ds in self.task_datastores: | |||
prefetch.update(ds.keys_for_artifacts(prefetch_data_artifacts)) | |||
l = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chage is because there can now be more than one key for each artifact so this is now a list of list.
if self._datastore and name in self._datastore: | ||
# load the attribute from the datastore... | ||
x = self._datastore[name] | ||
# ...and cache it in the object for faster access | ||
setattr(self, name, x) | ||
return x | ||
# For MetaflowArtifact, we extract the representation of it to present to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
store representation for user but keep original artifact as well.
@@ -129,8 +149,15 @@ def __init__( | |||
self._objects = {} | |||
self._info = {} | |||
elif self._mode == "r": | |||
if data_metadata is not None: | |||
# We already loaded the data metadata so just use that | |||
if data_metadata is not None and not any( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic to fetch metadata from S3 if the one from the metadata is incomplete. This way, we don't change metadata view at all. This is only used through client.
@@ -740,8 +801,19 @@ def artifacts_iter(): | |||
# '_' as they are used by the Metaflow runtime. | |||
delattr(flow, var) | |||
yield var, val | |||
art = flow._orig_artifacts.get(var) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change in behavior possible here since we keep things we serialize if they require it. For others, no change. Note that the idea would be that post_serialize
could hollow out the MetaflowArtifact as well to keep memory consumption low.
Testing[813] @ 7bd1dcd had 14 FAILUREs. |
Testing[813] @ 7bd1dcd had 15 FAILUREs. |
2eb47d7
to
6890682
Compare
This includes both serialization/deserialization customization as well as directed artifact processing.
Examples coming soon.