-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SVCS-552] Remove unneeded API calls in folder_file_ops #287
[SVCS-552] Remove unneeded API calls in folder_file_ops #287
Conversation
8e9db9d
to
918743f
Compare
Added construct_path and construct_empty_path to base- provider for individual providers to make paths for folder-file-ops without needing to make API calls
f1e59f8
to
1188a93
Compare
def path_from_metadata(self, # type: ignore | ||
parent_path: BitbucketPath, | ||
metadata) -> BitbucketPath: | ||
return parent_path.child(metadata.name, folder=metadata.is_folder) | ||
meta_data) -> BitbucketPath: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metadata is one word
@@ -670,6 +670,39 @@ def can_duplicate_names(self) -> bool: | |||
""" | |||
raise NotImplementedError | |||
|
|||
async def construct_path(self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally you should be able to use construct_path and revalidate_path interchangeably, so it seems more logical to just modify revalidate_path so it doesn't ever make unnecessary metadata calls, even on file moves, what prevents that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a fairly specific case where we know we have metadata available before calling revalidate_path
. Often times we will not have this information available. revalidate_path
is also used extensively throughout providers, so to me it does not make sense to have to overhaul revalidate_path
in core and in nearly every provider to make this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely convinced by this answer, for example why add this new method to Bitbucket, when Bitbucket doesn't make any unnecessary metadata calls in the first place? Providers that use the BaseProvider revalidate method will be fine because the BaseProvider doesn't make a request. Could be beneficial to just pass it as a 'no_new_call` kwarg or something.
It's a lot of work either way, but fixing the revalidate in the provider has the added benefit of ensuring unnecessary calls arn't being made in other operators where revalidate_path is being used. Also it's far fewer new tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency, and if Bitbucket ever becomes more than a RO provider, its already implemented. Also it is possible if you move a folder from Bitbucket, it will at least interact with one of the new methods (if you can, im not sure if RO providers support that, just an example).
The point of this ticket was to optimize _folder_file_op
. While optimizing revalidate_path
does have some added benefits, it is a much much much larger task.
It feels worse to me to muck up revalidate_path
than to just make a new function that can then decide what to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah but Bitbucket was just an example, the same is true of S3, Dropbox, about half of our providers.
What about a middle-ground where instead of construct_path we extend path_from_metadata to all providers and use that instead? That way we don't have to write a single-purpose method and it's already taken care of in terms of most unit testing. Construct_path is often just a wrapper for path_from_metadata anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes it can be valuable to call revalidate_path
over path_from_metadata
. If a provider calling revalidate_path
is fine in this context, then I want that provider to still be able to call revalidate_path
in _folder_file_ops
. Why change the call to something else when the old functionality was just fine, or could introduce unintended behavior?
I did think of just taking over path_from_metadata
, however it is already used in some places, and I don't want to just redirect it to revalidate_path
when it shouldn't. Or take it over and then have to evaluate what to do with cases where it is already called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at BaseProvider.path_from_metadata seems to indicate you're only going to need to write new behavior where the path is Id based and it's already been written for googledrive, so that doesn't leave many providers left to change. You shouldn't have to change existing behavior for path_from_metadata.
@@ -670,6 +670,39 @@ def can_duplicate_names(self) -> bool: | |||
""" | |||
raise NotImplementedError | |||
|
|||
async def construct_path(self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely convinced by this answer, for example why add this new method to Bitbucket, when Bitbucket doesn't make any unnecessary metadata calls in the first place? Providers that use the BaseProvider revalidate method will be fine because the BaseProvider doesn't make a request. Could be beneficial to just pass it as a 'no_new_call` kwarg or something.
It's a lot of work either way, but fixing the revalidate in the provider has the added benefit of ensuring unnecessary calls arn't being made in other operators where revalidate_path is being used. Also it's far fewer new tests.
@@ -670,6 +670,39 @@ def can_duplicate_names(self) -> bool: | |||
""" | |||
raise NotImplementedError | |||
|
|||
async def construct_path(self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at BaseProvider.path_from_metadata seems to indicate you're only going to need to write new behavior where the path is Id based and it's already been written for googledrive, so that doesn't leave many providers left to change. You shouldn't have to change existing behavior for path_from_metadata.
As mentioned, this PR is replaced by #311. |
Added construct_path and construct_empty_path to base-
provider for individual providers to make paths for folder-file-ops
without needing to make API calls
Ticket
https://openscience.atlassian.net/browse/SVCS-552
Purpose
the Baseprovider class
folder_file_op
function was making calls to get metadata, and then making the same calls again when calling revalidate path. It is possible to create paths from this metadata and forgo making more API calls.Changes
Added 2 new functions to the Baseprovider.
construct_path
andconstruct_empty_path
.Construct_path takes the parent path and metadata. Each provider then has its own implementation on how to get a path from this. Some will simply use it as a proxy to
revalidate_path
since theirrevalidate_path
does not make API calls. If an individual providersrevalidate_path
does make api calls, it will instead callpath_from_metadata
or something similar.construct_empty_path
returns a math from metadata where the identifier is none( ie it doesnt actually exit at that location).Added tests to test these new functions.
construct_empty_path
is tested in core/test_provider since almost all providers use the base definition.construct_path
is tested individually in providers and is ALWAYS tested against the output ofrevalidate_path
because that is what it is replacing.Dataverse is the only provider that could still makes API calls here, however I am fairly certain that dataverse cannot perform
folder_file_ops
so it should never call those functions.Side effects
There could be some timing issues, however this seems unlikely
QA Notes
To test, move or copy a folder containing at least one file across 2 providers.
ie move folder1 from OSFstorage to GoogleDrive. It should work and make less API calls
Figshare may not work quite correctly. It is very hard to test with the many bugs figshare has.
Deployment Notes