-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SVCS-486] Add Cloudfiles Provider #283
base: develop
Are you sure you want to change the base?
[SVCS-486] Add Cloudfiles Provider #283
Conversation
…erbutler into cloudfiles * 'develop' of https://github.com/CenterForOpenScience/waterbutler: Add PR template.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a light pass, I will do a more thorough one once I've got my local docker set up. The thing about adding links to the API docs in the method docstrings should be done for all methods. It's a huge time-saver when trying to debug. I've started adding them as I go, and this would be a great chance to fill out RsCf.
@@ -32,7 +35,8 @@ def ensure_connection(func): | |||
class CloudFilesProvider(provider.BaseProvider): | |||
"""Provider for Rackspace CloudFiles. | |||
|
|||
API Docs: https://developer.rackspace.com/docs/cloud-files/v1/developer-guide/#document-developer-guide | |||
API Docs: https://developer.rackspace.com/docs/cloud-files/v1/developer-guide/ | |||
#document-developer-guide |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: you don't need to wrap urls in docstrings. If flake complains about them, we can append a # noqa: E501
to it.
|
||
@property | ||
def extra(self): | ||
return { | ||
'hashes': { | ||
'md5': self.raw['etag'].replace('"', ''), | ||
'md5': self.raw['ETAG'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did Cloudfiles stop wrapping their etags in double-quotes? If so, they need to be added for the etag
property. etag
should have them, hashes.md5
shouldn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should quotes be added? They don't appear naturally as part of the metadata response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The etag will only be quoted if it is a manifest object
over 5G
See response -> etag -> description in https://developer.rackspace.com/docs/cloud-files/v1/storage-api-reference/object-services-operations/
and
https://developer.rackspace.com/docs/cloud-files/v1/use-cases/additional-object-services-information/#creating-large-objects
@@ -211,18 +232,23 @@ def sign_url(self, path, method='GET', endpoint=None, seconds=settings.TEMP_URL_ | |||
:param int seconds: Time for the url to live | |||
:rtype str: | |||
""" | |||
from urllib.parse import unquote |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move to top of file.
|
||
@ensure_connection | ||
async def create_folder(self, path, folder_precheck=None): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a docstring with a link to the API docs for the endpoint this uses.
|
||
@ensure_connection | ||
async def revisions(self, path): | ||
version_location = await self._get_version_location() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a docstring w/ url to API docs.
"""Deletes the key at the specified path | ||
:param str path: The path of the key to delete | ||
:rtype ResponseStreamReader: | ||
:param bool confirm_delete: not used in this provider |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the user tries to DELETE /resources/mst3k/providers/cloudfiles/
? The user should receive a warning that deleting everything in the root folder is only allowed if confirm_delete=1
is given. It also should not attempt to delete the actual root folder. I don't know how that would play out on S3, but on a dropbox-like provider, that would break the addon as the user would no-longer be connected to their storage root
) | ||
json_resp = await resp.json() | ||
current = (await self.metadata(path)).to_revision() | ||
return [current] + [CloudFilesRevisonMetadata(revision_data) for revision_data in json_resp] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this response not include the metadata for the latest revision? If so, that's worthy of a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the comment in the docstring.
import json | ||
import pytest | ||
import aiohttp | ||
import aiohttpretty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: pytest, aiohttp* are third-party and should be separated from the core libs. unittest is a core lib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1st pass, will for sure need to do another large review after these changes.
Lot of style changes and other things, here goes:
- Lots of extra blank lines to be fixed, especially in fixtures/tests
- other style changes in a few places (import orders etc)
- unused variable
body
in a test - A few exception raising tests have no real tests beyond the exception raising. Suggestion: consider adding a few for robustness
5.Cloud files api link doesn't work correctly - Throughout the
Cloudfiles.provider
, update docstrings to reflect changes (I noted a few but there are probably more). Suggestion: Add default variable typing and default return typing where appropriate. - Questions: Why are you removing
**kwargs
all over the place? Most providers seem to just leave it even if it isn't used. delete
needs changes. It should default to 0, and look forconfirm_delete == 1
rather thannot confirm_delete
also deleting a root should follow special steps over regular deleting. See how other providers handle it. How you have it set up, the actual root will be deleted (I think). It should just delete the contents of the root.- in quite a few places, you call
_metadata_<helper_function>
over just callingmetadata
for style and consitency reasons, you should always be calling justmetadata
path_str
variable is only used once. Just replace it with its value,path.path
- Move functions around, specifically
revisions
andcreate_folder
. They are in the middle of the_<helper_function>
's and should be more nearmetadata
- conflict checking on
create_folder
. Other providers check for a409
and respond accordingly - the
_get_version_location
error has a link in it. Unsure if that is kosher. get_metadata_revision
should be named in the form of_metadata_revision
. Similar to the othermetadata()
helper functions
from waterbutler.providers.cloudfiles import CloudFilesProvider | ||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra blank line. Should be 2 instead of 3.
return CloudFilesProvider(auth, credentials, settings) | ||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra blank line. Should be 2 instead of 3.
return streams.FileStreamReader(file_like) | ||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra blank line. Should be 2 instead of 3.
@pytest.fixture | ||
def folder_root_empty(): | ||
return [] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs another blank line. Should be 2 instead of 1.
def container_header_metadata_without_verision_location(): | ||
with open(os.path.join(os.path.dirname(__file__), 'fixtures/fixtures.json'), 'r') as fp: | ||
return json.load(fp)['container_header_metadata_without_verision_location'] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs another blank line. Should be 2 instead of 1.
'versions follow the instructions here: ' | ||
'https://developer.rackspace.com/docs/cloud-files/v1/' | ||
'use-cases/additional-object-services-information/' | ||
'#object-versioning') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might not be the best to add links in error messages. They get outdated, etc.
@felliott ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spoke to @felliott about this. My understanding is that for now, we want to treat this more like other providers. If they don't have versioning set up, let's keep it simple. No error message, let's just not show versions, only current.
@ensure_connection | ||
async def revisions(self, path): | ||
"""Get past versions of the request file from special user designated version container, | ||
if the user hasn't designated a version_location container in raises an infomative error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"in raises" -> "it raises"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also revisions should probably be moved next to metadata, and not in the middle of the _<helper_function>
's
the revision list after other revisions are returned. More info about versioning with Cloud | ||
Files here: | ||
|
||
https://developer.rackspace.com/docs/cloud-files/v1/use-cases/additional-object-services-information/#object-versioning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line length. Although idk how to fix line length in a docstring
return [current] + [CloudFilesRevisonMetadata(revision_data) for revision_data in json_resp] | ||
|
||
@ensure_connection | ||
async def get_metadata_revision(self, path, version=None, revision=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function should follow the same naming scheme as the other metadata helper functions:
eg: _metadata_<for some thing>
so this should be called _metadata_revision
if data.get('subdir'): | ||
return CloudFilesFolderMetadata(data) | ||
elif data['content_type'] == 'application/directory': | ||
return CloudFilesFolderMetadata({'subdir': data['name'] + '/'}) | ||
return CloudFilesFileMetadata(data) | ||
|
||
@ensure_connection | ||
async def create_folder(self, path, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create_folder should probably be moved so it is not in the middle of the _<helper_function>'s
…erbutler into cloudfiles * 'develop' of https://github.com/CenterForOpenScience/waterbutler: (77 commits) final cleanups for onedrive make provider readonly start cleaning up and reorganizing onedrive implement Microsoft OneDrive provider using dropbox as base added onedrive provider -- copied from Dropbox don't assume file metadata has a modified/created date add mypy type annotations add modified/created dates to file metadata expand docstrings add ruby serialization workaround to download clean up commit_sha vs. branch_name handling add tests for revisions and uninit-ed repos add artificial test for missing folder remove unused and obsolete code rewrite test suite for provider changes and coverage update some more docstrings on the provider GL provider is read-only: folder creation is not allowed add workaround for non-existent directories not being reported in GL document workarounds in download & _fetch_file_contents remove unneeded error wrapping from metadata ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few things to look at here. My notes/questions about revisions probably aren't actionable. Made a few comments about _delete_folder_contents . In addition to those comments, I tested deleting root and it returned 204
success, but it didn't actually delete the contents of root. Shaping up nicely, back to you.
"""Deletes the key at the specified path | ||
:param str path: The path of the key to delete | ||
:rtype ResponseStreamReader: | ||
:param int confirm_delete: Must be 1 to confirm root folder delete, this deletes entire | ||
container object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this delete container object? Folder deletes against root folder should delete the contents of the folder/container, but not the folder/container itself.
|
||
if path.is_root and not confirm_delete: | ||
raise exceptions.DeleteError( | ||
'query argument confirm_delete=1 is required for deleting the entire container.', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, delete root folder shouldn't actually delete folder, only contents.
) | ||
|
||
if path.is_dir: | ||
await self._delete_folder_contents(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently _delete_folder_contents
is being used only for the purpose of deleting the contents of the root folder, but not the folder itself. The if
should probably be if path.is_root
. Other (non-root) folders should be deleted along with their contents.
The thought behind not deleting root folder, even when specifically requested, is that it busts the osf.io addon config.
for item in metadata | ||
] | ||
|
||
delete_files.append(os.path.join('/', self.container, path.path)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_delete_folder_contents
shouldn't delete the folder itself.
await resp.release() | ||
|
||
@ensure_connection | ||
async def metadata(self, path, recursive=False, version=None, revision=None, **kwargs): | ||
"""Get Metadata about the requested file or folder | ||
:param str path: The path to a key or folder | ||
:rtype dict: | ||
:rtype list: | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update doc strings for new args or maybe switch to type hints.
@@ -339,11 +381,12 @@ def _extract_endpoints(self, data): | |||
) | |||
data = await resp.json() | |||
|
|||
# no data and the provider path is not root, we are left with either a file or a directory marker | |||
# no data and the provider path is not root, we are left with either a file or a directory | |||
# marker | |||
if not data and not path.is_root: | |||
# Convert the parent path into a directory marker (file) and check for an empty folder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r/(file)/(item)/
if not data and not path.is_root: | ||
# Convert the parent path into a directory marker (file) and check for an empty folder | ||
dir_marker = path.parent.child(path.name, folder=False) | ||
metadata = await self._metadata_file(dir_marker, is_folder=True, **kwargs) | ||
dir_marker = path.parent.child(path.name, folder=path.is_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is path.parent.child(path.name, folder=path.is_dir) different then path? Can't wrap my head around it. How am I not my father's son?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function child is to create a new child path with a new name, so you are your father's child, but your sibling is your father's child with a different name.
resp = await self.make_request( | ||
'PUT', | ||
functools.partial(self.sign_url, path, 'PUT'), | ||
expects=(200, 201), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are both these codes possible, I would think that the response would always be one or the other, hopefully 201, on success.
'versions follow the instructions here: ' | ||
'https://developer.rackspace.com/docs/cloud-files/v1/' | ||
'use-cases/additional-object-services-information/' | ||
'#object-versioning') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spoke to @felliott about this. My understanding is that for now, we want to treat this more like other providers. If they don't have versioning set up, let's keep it simple. No error message, let's just not show versions, only current.
|
||
@ensure_connection | ||
async def revisions(self, path): | ||
"""Get past versions of the request file from special user designated version container, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r/request/requested/
Ticket
https://openscience.atlassian.net/browse/SVCS-486
Purpose
Expand the cloudfiles provider to make more like a real provider,
Changes
Adds upload, download, fetch metadata, move, copy, folders creation etc to the Cloudfiles provider, also includes unit tests and moves fixtures to new file. Changes some methods to support unicode file names.
Side effects
None that I know of.
QA Notes
Should include testing for special characters in filenames.
Deployment Notes
This should be deployed be for the corrosponding osf.io PR, CenterForOpenScience/osf.io#7781