Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SVCS-552] Remove unneeded API calls in folder_file_ops #287

Conversation

AddisonSchiller
Copy link
Contributor

Added construct_path and construct_empty_path to base-
provider for individual providers to make paths for folder-file-ops
without needing to make API calls

Ticket

https://openscience.atlassian.net/browse/SVCS-552

Purpose

the Baseprovider class folder_file_op function was making calls to get metadata, and then making the same calls again when calling revalidate path. It is possible to create paths from this metadata and forgo making more API calls.

Changes

Added 2 new functions to the Baseprovider. construct_path and construct_empty_path.
Construct_path takes the parent path and metadata. Each provider then has its own implementation on how to get a path from this. Some will simply use it as a proxy to revalidate_path since their revalidate_path does not make API calls. If an individual providers revalidate_path does make api calls, it will instead call path_from_metadata or something similar.

construct_empty_path returns a math from metadata where the identifier is none( ie it doesnt actually exit at that location).

Added tests to test these new functions. construct_empty_path is tested in core/test_provider since almost all providers use the base definition.

construct_path is tested individually in providers and is ALWAYS tested against the output of revalidate_path because that is what it is replacing.

Dataverse is the only provider that could still makes API calls here, however I am fairly certain that dataverse cannot perform folder_file_ops so it should never call those functions.

Side effects

There could be some timing issues, however this seems unlikely

QA Notes

To test, move or copy a folder containing at least one file across 2 providers.
ie move folder1 from OSFstorage to GoogleDrive. It should work and make less API calls

Figshare may not work quite correctly. It is very hard to test with the many bugs figshare has.

Deployment Notes

@coveralls
Copy link

coveralls commented Oct 18, 2017

Coverage Status

Coverage increased (+0.04%) to 89.057% when pulling 918743f on AddisonSchiller:feature/folder-file-ops-revamp into 3d9ac2f on CenterForOpenScience:develop.

@coveralls
Copy link

coveralls commented Oct 31, 2017

Coverage Status

Coverage increased (+0.04%) to 89.244% when pulling f1e59f8 on AddisonSchiller:feature/folder-file-ops-revamp into cc68aca on CenterForOpenScience:develop.

Added construct_path and construct_empty_path to base-
provider for individual providers to make paths for folder-file-ops
without needing to make API calls
@coveralls
Copy link

coveralls commented Oct 31, 2017

Coverage Status

Coverage increased (+0.04%) to 89.244% when pulling 1188a93 on AddisonSchiller:feature/folder-file-ops-revamp into cc68aca on CenterForOpenScience:develop.

def path_from_metadata(self, # type: ignore
parent_path: BitbucketPath,
metadata) -> BitbucketPath:
return parent_path.child(metadata.name, folder=metadata.is_folder)
meta_data) -> BitbucketPath:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metadata is one word

@@ -670,6 +670,39 @@ def can_duplicate_names(self) -> bool:
"""
raise NotImplementedError

async def construct_path(self,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally you should be able to use construct_path and revalidate_path interchangeably, so it seems more logical to just modify revalidate_path so it doesn't ever make unnecessary metadata calls, even on file moves, what prevents that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fairly specific case where we know we have metadata available before calling revalidate_path. Often times we will not have this information available. revalidate_path is also used extensively throughout providers, so to me it does not make sense to have to overhaul revalidate_path in core and in nearly every provider to make this change.

Copy link
Contributor

@Johnetordoff Johnetordoff Nov 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely convinced by this answer, for example why add this new method to Bitbucket, when Bitbucket doesn't make any unnecessary metadata calls in the first place? Providers that use the BaseProvider revalidate method will be fine because the BaseProvider doesn't make a request. Could be beneficial to just pass it as a 'no_new_call` kwarg or something.

It's a lot of work either way, but fixing the revalidate in the provider has the added benefit of ensuring unnecessary calls arn't being made in other operators where revalidate_path is being used. Also it's far fewer new tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, and if Bitbucket ever becomes more than a RO provider, its already implemented. Also it is possible if you move a folder from Bitbucket, it will at least interact with one of the new methods (if you can, im not sure if RO providers support that, just an example).

The point of this ticket was to optimize _folder_file_op. While optimizing revalidate_path does have some added benefits, it is a much much much larger task.

It feels worse to me to muck up revalidate_path than to just make a new function that can then decide what to do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah but Bitbucket was just an example, the same is true of S3, Dropbox, about half of our providers.

What about a middle-ground where instead of construct_path we extend path_from_metadata to all providers and use that instead? That way we don't have to write a single-purpose method and it's already taken care of in terms of most unit testing. Construct_path is often just a wrapper for path_from_metadata anyways.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes it can be valuable to call revalidate_path over path_from_metadata. If a provider calling revalidate_path is fine in this context, then I want that provider to still be able to call revalidate_path in _folder_file_ops. Why change the call to something else when the old functionality was just fine, or could introduce unintended behavior?

I did think of just taking over path_from_metadata, however it is already used in some places, and I don't want to just redirect it to revalidate_path when it shouldn't. Or take it over and then have to evaluate what to do with cases where it is already called.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at BaseProvider.path_from_metadata seems to indicate you're only going to need to write new behavior where the path is Id based and it's already been written for googledrive, so that doesn't leave many providers left to change. You shouldn't have to change existing behavior for path_from_metadata.

@@ -670,6 +670,39 @@ def can_duplicate_names(self) -> bool:
"""
raise NotImplementedError

async def construct_path(self,
Copy link
Contributor

@Johnetordoff Johnetordoff Nov 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely convinced by this answer, for example why add this new method to Bitbucket, when Bitbucket doesn't make any unnecessary metadata calls in the first place? Providers that use the BaseProvider revalidate method will be fine because the BaseProvider doesn't make a request. Could be beneficial to just pass it as a 'no_new_call` kwarg or something.

It's a lot of work either way, but fixing the revalidate in the provider has the added benefit of ensuring unnecessary calls arn't being made in other operators where revalidate_path is being used. Also it's far fewer new tests.

@@ -670,6 +670,39 @@ def can_duplicate_names(self) -> bool:
"""
raise NotImplementedError

async def construct_path(self,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at BaseProvider.path_from_metadata seems to indicate you're only going to need to write new behavior where the path is Id based and it's already been written for googledrive, so that doesn't leave many providers left to change. You shouldn't have to change existing behavior for path_from_metadata.

@cslzchen
Copy link
Contributor

cslzchen commented Jan 8, 2018

As mentioned, this PR is replaced by #311.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants