-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SVCS-552] folder_file_op cut down revalidate calls. #311
base: develop
Are you sure you want to change the base?
[SVCS-552] folder_file_op cut down revalidate calls. #311
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good and I will take over the ticket/PR. Two minor issues:
- Need to update Google Cloud as well which has been added recently.
- This change affects all path-based provider and will cause conflicts for sure. Proceed with caution 🚔 .
@@ -133,6 +133,16 @@ def kind(self) -> str: | |||
""" `file` or `folder` """ | |||
raise NotImplementedError | |||
|
|||
@property | |||
@abc.abstractmethod | |||
def id(self) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the DocStr to mention what this id
is for both types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -334,7 +334,7 @@ def request(self, *args, **kwargs): | |||
|
|||
folder = await dest_provider.create_folder(dest_path, folder_precheck=False) | |||
|
|||
dest_path = await dest_provider.revalidate_path(dest_path.parent, dest_path.name, folder=dest_path.is_dir) | |||
folder_path = dest_provider.path_from_metadata(dest_path.parent, folder) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename folder
to folder_metadata
(including the assign statement two lines above).
@@ -24,6 +24,10 @@ def __init__(self, raw, path_obj, owner=None, repo=None): | |||
self.owner = owner | |||
self.repo = repo | |||
|
|||
@property | |||
def id(self): | |||
return self.build_path() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since id
is exactly the same, why not using return self.path
? This is what other providers do.
- This id consolidates both path-based ID and id- based ID under a single property. - It is identical to the path object identifier which is used to revalidate paths.
- In addititon, remove superfluous condition statement `if not futures: continue`
6732325
to
a27b568
Compare
- `self.id` is identical to `self.path` since GCS is a path-based provider
- Intermediate commit. Please rebase or ammend. - Discuss the todos and comments, fix issues if any. Update docstring, comments and remove todos. - Need travis ci.
8adfb5c
to
b61b2ec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Johnetordoff and CC @felliott
I rebased and updated the code with extra comments for each provider affected. Need some further discussion with you before I can continue. Thanks.
@@ -27,6 +27,10 @@ class BaseGoogleCloudMetadata(metadata.BaseMetadata, metaclass=abc.ABCMeta): | |||
def provider(self) -> str: | |||
return 'googlecloud' | |||
|
|||
@property | |||
def id(self) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add GoogleCloud to pass tests.
@@ -133,6 +133,16 @@ def kind(self) -> str: | |||
""" `file` or `folder` """ | |||
raise NotImplementedError | |||
|
|||
@property | |||
@abc.abstractmethod | |||
def id(self) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -27,8 +27,11 @@ class BaseGoogleCloudMetadata(metadata.BaseMetadata, metaclass=abc.ABCMeta): | |||
def provider(self) -> str: | |||
return 'googlecloud' | |||
|
|||
# GoogleCloud is a path-based provider. However, it never calls ``revalidate_path()`` since it | |||
# is the backend storage provider for OSFStorage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to CloudFiles (when it was the storage provider), GoogleCloud never uses revalidate_path()
. We should set the id
to be None
or empty so we know it is never used accidentally or cause any confusion in the further.
@property | ||
def provider(self): | ||
return 'github' | ||
|
||
# GitHub is a path-based provider. However, it has its own ``revalidate_path()``. | ||
# TODO: Investigate if this is an issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GitHub may not work because your fix calls (functionally, not literally) the default revalidate_path()
instead of the customized one.
@property | ||
def provider(self): | ||
return 'dropbox' | ||
|
||
# TODO: Is Dropbox path-based instead of id-based? | ||
# TODO: Dropbox does not have a ``revalidate_path()``, should we use path instead of id? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no revalidate_path()
for dropbox which makes guess that it is a path based (or at lease both)?
@property | ||
def provider(self): | ||
return 'dataverse' | ||
|
||
# TODO: Should this be in ``DataverseFileMetadata``? | ||
# TODO: Does ``self.raw['id']`` exists for ``DataverseDatasetMetadata``? | ||
# TODO: DataVerse's ``revalidate_path()`` looks very different. Is this an issue? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to GitHub, your fix calls (functionally, not literally) revalidate_path()
differently than the original way. This might be an issue.
@property | ||
def provider(self): | ||
return 'cloudfiles' | ||
|
||
# CloudFiles is a path-based provider. However, it never called ``revalidate_path()`` even when | ||
# it was the backend storage provider for OSFStorage. The provider is no longer active until it | ||
# is upgraded to an addon provider for the users. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See GoogleCloud.
Ticket
https://openscience.atlassian.net/browse/SVCS-552
This PR replaces #287
Purpose
folder_file_op makes roughly a trillion metadata calls every time it moves a folder's children. This fix makes it easy to get a path from it's metadata object by giving the metadata an id which corresponds with the path identifier. Since we can now convert a metadata object into a path easily we no longer need to re-validate each time.
Changes
Side effects
None that I know of.
QA Notes
Copy/moves between providers should be unchanged or slightly faster.
Deployment Notes
None.