forked from CenterForOpenScience/waterbutler
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ft/dropbox chunked uploads #11
Open
NyanHelsing
wants to merge
175
commits into
Johnetordoff:dropbox-chunked-uploads
Choose a base branch
from
NyanHelsing:ft/dropbox-chunked-uploads
base: dropbox-chunked-uploads
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Ft/dropbox chunked uploads #11
NyanHelsing
wants to merge
175
commits into
Johnetordoff:dropbox-chunked-uploads
from
NyanHelsing:ft/dropbox-chunked-uploads
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* According to the RFC[1], a server may ignore a Range header and should ignore a Range header containing units it doesn't understand. WB was erroring under these conditions, but it seems more appropriate to ignore the field. Testing against external providers showed no consistent practice to follow. The parse_request_range docs have been updated to reflect the new behavior. Big thanks to @birdbrained for doing the legwork on researching this! [1] https://tools.ietf.org/html/rfc7233#section-3.1
This is a rebased and squashed commit of a fully working implementation for the provider based on Google Cloud Storage's JSON API and OAuth 2.0 protocol. For detailed commits and messages, please see this PR: CenterForOpenScience#317
- Removed functionalities that are not used by OSFStorage and that do not have documented support - Removed JSON API related docstrings/comments - Added TODOs on what needs to be refactored - Update tests
- updated settings to use XML API - updated metadata.py to parse response headers and init metadata object - added a helper function in utils.py to convert GCS's base64-encoded hash to hex digest - refactor the structure of fixtures and updated them with real responses from Postman tests
- Fully refactored all provider actions to use XML API and signed request and implemented a minimal version: - Upload - Download - Metadata for file - Delete file - Intra-copy file - Added TODO comments for Phase 1, 1.5 and 2: - Create folder - Metadata for folder - Delete folder - Intra-copy folder - Rewrote the provider's `.build_url` with `.build_and_sign_req_url` to take care of the url buidling and request signing together
Discovered and fixed a few issues during OSF integration test and updated comments and docstr. - Main issue: aiohttp parses the `x-goog-hash` correctly and returns an `aiohttp.MultiDict` dictionary that contains two entries with the same key, one for crc32c and one for md5. Fix: updated headers parsing and tests - Minor fixes/updates - Updated HTTP method for delete and intra-copy - Added bucket to "x-goog-copy-source" and no longer convert value to lower case for request signing - Strip '"' from "ETag" for verifying upload checksum - Removed "Content-length" and only use "x-goog-stored- content-length" for file size - Prefixed `build_and_sign_req_url()` with `_`
Note: environment variables are passed in as string
Some tests and refactors to come, but provider is ready for testing. [SVCS-617] Closes: CenterForOpenScience#322
- WB only needs to know the bucket name, OSF handles the region and select the bucket with the expected region for WB.
- Removed deprecated `fixture.py`, which is now replaced by `providers.py`, `folders.py` and `files.py` in the `fixtures/` directory. - Updated fixutres for CRUD operations and added back CRUD tests that were accidentally removed - Fix exipration check in utility tests where it now use the `settings.SIGNATURE_EXPIRATION` to calculate the expected exipiration time. Remove redundant type casting for expiration - Type casting is now done in the settings.py. I assume that this piece of code is accidentally left here during the last merge.
- Fix imports order. `typing` is a standard lib - Use `+=` for request segments contatenation - Replace `return True if <condition> else False` with `return bool(<condition>)` where the condition is `None` or `not None`
- Update both the metadata method and the utility function to use the strict regex matching based on the RFC specification for Base 64 encoded crc32c and md5 hashes - Google Cloud uses the standard alphabets for Base 64 encoding: [A-Za-z0-9+/=] - RFC reference: http://www.rfc-editor.org/rfc/rfc4648.txt
- Added an alternative constructor for the base metadata which takes either an standard python dict or a pair of object name and multi-value dict during initialization - Updated its usage in the provider - Changed @staticmethod to @classmethod for `get_metadata_from_resp_headers()` - Added metadata tests for both successful and failed initialiaztion.
- Added `get_multi_dict_from_json()` (and a test for it) to utils so that all tests now use this helper method to build resp headers - Refactored the metadata structure to test three classes separately - Removed import alias such as `core_exception`, `pd_settings` and `pd_utils` since there is no shadowing issues any more
- Both GC file and folder metadata now uses this decidated method to initialize with object name and aiohttp's "MultiDict" response headers - Removed the alternative constructor for GC base metadata and updated related code and tests
- Moved quirks in comments into DocStr so that that they are available in WB Docs - Use `` (double tildes) for code in DocStr - Fixed Sphinx warnings
Side effect: modified the function signature for get_multi_dict_from_python_dict() to expect dict instead of json; updated all related tests.
- GC's private memebers are now availabe in the WB Docs by adding :private-members: - Remove :inherited-members: and :undoc-members: - Updated the .rst files to include GC, metadata and utils
- mppy doesn't handle class inheritence well - mypy has a problem with **{} arguments - mypy doesn't handle multiple options of return type well
- Upload fails due to CIMultiDcitProxy inherits from MultiDictProxy but not from MultiDict. However, aiohttpretty returns CIMultiDict while aiohttp returns CIMultiDictProxy, the type is strict and only recognize CIMultiDict - Updated the check statement with the either MultiDict or MultiDictProxy: - WB code uses CIMultiDcitProxy (a subclass of MultiDictProxy) since aiohttp parses the hash headers already. - WB test code uses MultiDict to modify the dictionary in get_multi_dict_from_python_dict() which returns MultiDcitProxy.
* Minor updates to formatting of method sigantures, import order, and judicious application of DeMorgan's Law to make conditionals more readable. * Update formatting of docstrings to make Sphinx docs more cross-linked and browsable.
* Minor style fixes. * Update formatting of docstrings to make Sphinx docs more cross-linked and browsable.
[SVCS-NO-TICKET] Improve The Command `invoke test` Closes: CenterForOpenScience#353
cslzchen
force-pushed
the
ft/dropbox-chunked-uploads
branch
10 times, most recently
from
July 19, 2018 16:44
e2718b1
to
4913bc4
Compare
* Analytics fields will be left in and hardcoded to `None` to avoid changing the schema.
[SVCS-871]
- For chunked upload, add `upload_part()` to handle one chunk upload and `upload_part()` now calls `upload_part()`. - For normal upload, move the code into `contiguous_upload()`
- According to Dropbox API docs, files larger than 150 MB must use chunked upload. Chunks can be any size up to 150 MB. A typical size is 4 MB, which is what WB uses. The max file size can be uploaded is 350 GB.
* The dropbox provider should error if it can't get a session identifier. A `KeyError` should suffice until we see what an actual error condition looks like.
felliott
force-pushed
the
ft/dropbox-chunked-uploads
branch
from
July 26, 2018 05:23
4913bc4
to
edd2208
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.