Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ft/dropbox chunked uploads #11

Open
wants to merge 175 commits into
base: dropbox-chunked-uploads
Choose a base branch
from

Commits on Mar 7, 2018

  1. Ignore invalid range headers instead of erroring

     * According to the RFC[1], a server may ignore a Range header and
       should ignore a Range header containing units it doesn't
       understand.  WB was erroring under these conditions, but it seems
       more appropriate to ignore the field.  Testing against external
       providers showed no consistent practice to follow.  The
       parse_request_range docs have been updated to reflect the new
       behavior.  Big thanks to @birdbrained for doing the legwork on
       researching this!
    
     [1] https://tools.ietf.org/html/rfc7233#section-3.1
    felliott committed Mar 7, 2018
    Configuration menu
    Copy the full SHA
    fbc8399 View commit details
    Browse the repository at this point in the history

Commits on Mar 8, 2018

  1. Configuration menu
    Copy the full SHA
    265b47c View commit details
    Browse the repository at this point in the history
  2. Add GoogleCloudStorage Provider

    This is a rebased and squashed commit of a fully working
    implementation for the provider based on Google Cloud
    Storage's JSON API and OAuth 2.0 protocol.
    
    For detailed commits and messages, please see this PR:
    CenterForOpenScience#317
    cslzchen committed Mar 8, 2018
    Configuration menu
    Copy the full SHA
    ae2524e View commit details
    Browse the repository at this point in the history
  3. Prepare for XML-API refactor:

    - Removed functionalities that are not used by
      OSFStorage and that do not have documented
      support
    - Removed JSON API related docstrings/comments
    - Added TODOs on what needs to be refactored
    - Update tests
    cslzchen committed Mar 8, 2018
    Configuration menu
    Copy the full SHA
    d05ec48 View commit details
    Browse the repository at this point in the history
  4. GCS XML API refactor - part 1: metadata

    - updated settings to use XML API
    - updated metadata.py to parse response headers
      and init metadata object
    - added a helper function in utils.py to convert
      GCS's base64-encoded hash to hex digest
    - refactor the structure of fixtures and updated
      them with real responses from Postman tests
    cslzchen committed Mar 8, 2018
    Configuration menu
    Copy the full SHA
    6be26bc View commit details
    Browse the repository at this point in the history
  5. GCS XML API refactor - part 2: a minimal provider

    - Fully refactored all provider actions to use
      XML API and signed request and implemented a
      minimal version:
        - Upload
        - Download
        - Metadata for file
        - Delete file
        - Intra-copy file
    - Added TODO comments for Phase 1, 1.5 and 2:
        - Create folder
        - Metadata for folder
        - Delete folder
        - Intra-copy folder
    - Rewrote the provider's `.build_url` with
      `.build_and_sign_req_url` to take care of the
      url buidling and request signing together
    cslzchen committed Mar 8, 2018
    Configuration menu
    Copy the full SHA
    7d681ab View commit details
    Browse the repository at this point in the history

Commits on Mar 9, 2018

  1. GSC XML API refactor - part 3: a working provider

    Discovered and fixed a few issues during OSF integration
    test and updated comments and docstr.
    
    - Main issue: aiohttp parses the `x-goog-hash` correctly
      and returns an `aiohttp.MultiDict` dictionary that
      contains two entries with the same key, one for crc32c
      and one for md5. Fix: updated headers parsing and tests
    - Minor fixes/updates
      - Updated HTTP method for delete and intra-copy
      - Added bucket to "x-goog-copy-source" and no longer
        convert value to lower case for request signing
      - Strip '"' from "ETag" for verifying upload checksum
      - Removed "Content-length" and only use "x-goog-stored-
        content-length" for file size
      - Prefixed `build_and_sign_req_url()` with `_`
    cslzchen committed Mar 9, 2018
    Configuration menu
    Copy the full SHA
    4c70e1c View commit details
    Browse the repository at this point in the history

Commits on Mar 12, 2018

  1. Configuration menu
    Copy the full SHA
    75add86 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    b00dfb2 View commit details
    Browse the repository at this point in the history

Commits on Mar 13, 2018

  1. Configuration menu
    Copy the full SHA
    a3cc17f View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    cb07ea6 View commit details
    Browse the repository at this point in the history
  3. Explicit parse SIGNATURE_EXPIRATION to integer

    Note: environment variables are passed in as string
    cslzchen authored and felliott committed Mar 13, 2018
    Configuration menu
    Copy the full SHA
    14388eb View commit details
    Browse the repository at this point in the history
  4. Merge branch 'feature/gcloud-provider' into develop

     Some tests and refactors to come, but provider is ready for testing.
    
     [SVCS-617]
     Closes: CenterForOpenScience#322
    felliott committed Mar 13, 2018
    Configuration menu
    Copy the full SHA
    4863533 View commit details
    Browse the repository at this point in the history

Commits on Mar 23, 2018

  1. Remove region from Google Cloud

    - WB only needs to know the bucket name, OSF
      handles the region and select the bucket with
      the expected region for WB.
    cslzchen committed Mar 23, 2018
    Configuration menu
    Copy the full SHA
    625ebc4 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    4b5f5d7 View commit details
    Browse the repository at this point in the history

Commits on Mar 25, 2018

  1. Configuration menu
    Copy the full SHA
    9e551d1 View commit details
    Browse the repository at this point in the history
  2. Futher improve tests for GoogleCloud

    - Removed deprecated `fixture.py`, which is now
      replaced by `providers.py`, `folders.py` and
      `files.py` in the `fixtures/` directory.
    - Updated fixutres for CRUD operations and added
      back CRUD tests that were accidentally removed
    - Fix exipration check in utility tests where it
      now use the `settings.SIGNATURE_EXPIRATION` to
      calculate the expected exipiration time.
    
    Remove redundant type casting for expiration
    
    - Type casting is now done in the settings.py. I
      assume that this piece of code is accidentally
      left here during the last merge.
    cslzchen authored and felliott committed Mar 25, 2018
    Configuration menu
    Copy the full SHA
    c270be0 View commit details
    Browse the repository at this point in the history
  3. Add InvalidProviderConfigError to tests

    cslzchen authored and felliott committed Mar 25, 2018
    Configuration menu
    Copy the full SHA
    36fbf3f View commit details
    Browse the repository at this point in the history
  4. Fix minor code style issues:

    - Fix imports order. `typing` is a standard lib
    - Use `+=` for request segments contatenation
    - Replace `return True if <condition> else False`
      with `return bool(<condition>)` where the
      condition is `None` or `not None`
    cslzchen authored and felliott committed Mar 25, 2018
    Configuration menu
    Copy the full SHA
    376969e View commit details
    Browse the repository at this point in the history
  5. Use a strict regex for crc32c and md5 matching

    - Update both the metadata method and the utility
      function to use the strict regex matching based
      on the RFC specification for Base 64 encoded
      crc32c and md5 hashes
    - Google Cloud uses the standard alphabets for
      Base 64 encoding: [A-Za-z0-9+/=]
    - RFC reference:
      http://www.rfc-editor.org/rfc/rfc4648.txt
    cslzchen authored and felliott committed Mar 25, 2018
    Configuration menu
    Copy the full SHA
    800a7ea View commit details
    Browse the repository at this point in the history
  6. BaseGCMetadata now handles resp headers in init

    - Added an alternative constructor for the base
      metadata which takes either an standard python
      dict or a pair of object name and multi-value
      dict during initialization
    - Updated its usage in the provider
    - Changed @staticmethod to @classmethod for
      `get_metadata_from_resp_headers()`
    - Added metadata tests for both successful and
      failed initialiaztion.
    cslzchen authored and felliott committed Mar 25, 2018
    Configuration menu
    Copy the full SHA
    49ab45f View commit details
    Browse the repository at this point in the history
  7. Add a helper for parsing headers and update tests

    - Added `get_multi_dict_from_json()` (and a test
      for it) to utils so that all tests now use this
      helper method to build resp headers
    - Refactored the metadata structure to test three
      classes separately
    - Removed import alias such as `core_exception`,
      `pd_settings` and `pd_utils` since there is no
      shadowing issues any more
    cslzchen authored and felliott committed Mar 25, 2018
    Configuration menu
    Copy the full SHA
    58abe70 View commit details
    Browse the repository at this point in the history
  8. Add .new_from_resp_headers() to init GC metadata

    - Both GC file and folder metadata now uses this
      decidated method to initialize with object name
      and aiohttp's "MultiDict" response headers
    - Removed the alternative constructor for GC base
      metadata and updated related code and tests
    cslzchen authored and felliott committed Mar 25, 2018
    Configuration menu
    Copy the full SHA
    43153d3 View commit details
    Browse the repository at this point in the history
  9. Update DocStr and add PyDoc for metadata

    - Moved quirks in comments into DocStr so that
      that they are available in WB Docs
    - Use `` (double tildes) for code in DocStr
    - Fixed Sphinx warnings
    cslzchen authored and felliott committed Mar 25, 2018
    Configuration menu
    Copy the full SHA
    b0d385d View commit details
    Browse the repository at this point in the history
  10. Update DocStr and PyDoc for utils

    Side effect: modified the function signature for
    get_multi_dict_from_python_dict() to expect dict
    instead of json; updated all related tests.
    cslzchen authored and felliott committed Mar 25, 2018
    Configuration menu
    Copy the full SHA
    2550678 View commit details
    Browse the repository at this point in the history
  11. Add DocStr and PyDoc for the provider

    - GC's private memebers are now availabe in the
      WB Docs by adding :private-members:
    - Remove :inherited-members: and :undoc-members:
    - Updated the .rst files to include GC, metadata
      and utils
    cslzchen authored and felliott committed Mar 25, 2018
    Configuration menu
    Copy the full SHA
    6ebaa51 View commit details
    Browse the repository at this point in the history
  12. Fix mypy with # type: ignore

    - mppy doesn't handle class inheritence well
    - mypy has a problem with **{} arguments
    - mypy doesn't handle multiple options of
      return type well
    cslzchen authored and felliott committed Mar 25, 2018
    Configuration menu
    Copy the full SHA
    2982f54 View commit details
    Browse the repository at this point in the history
  13. Fix aiohttp's MultiDict and MultiDictProxy issue

    - Upload fails due to CIMultiDcitProxy inherits
      from MultiDictProxy but not from MultiDict.
      However, aiohttpretty returns CIMultiDict while
      aiohttp returns CIMultiDictProxy, the type is
      strict and only recognize CIMultiDict
    - Updated the check statement with the either
      MultiDict or MultiDictProxy:
      - WB code uses CIMultiDcitProxy (a subclass of
      MultiDictProxy) since aiohttp parses the hash
      headers already.
      - WB test code uses MultiDict to modify the
      dictionary in get_multi_dict_from_python_dict()
      which returns MultiDcitProxy.
    cslzchen authored and felliott committed Mar 25, 2018
    Configuration menu
    Copy the full SHA
    6031e56 View commit details
    Browse the repository at this point in the history

Commits on Mar 28, 2018

  1. code style & docstring updates for metadata.py

     * Minor updates to formatting of method sigantures, import order, and
       judicious application of DeMorgan's Law to make conditionals more
       readable.
    
     * Update formatting of docstrings to make Sphinx docs more
       cross-linked and browsable.
    felliott committed Mar 28, 2018
    Configuration menu
    Copy the full SHA
    ca26ed8 View commit details
    Browse the repository at this point in the history
  2. code style & docstring updates for utils.py

     * Minor style fixes.
    
     * Update formatting of docstrings to make Sphinx docs more
       cross-linked and browsable.
    felliott committed Mar 28, 2018
    Configuration menu
    Copy the full SHA
    2f6a4cc View commit details
    Browse the repository at this point in the history
  3. code style & docstring updates for provider.py

     * Minor updates to formatting of method sigantures.
    
     * Update formatting of docstrings to make Sphinx docs more
       cross-linked and browsable.
    felliott committed Mar 28, 2018
    Configuration menu
    Copy the full SHA
    ee6d669 View commit details
    Browse the repository at this point in the history
  4. style fixes for tests; remove unneeded export

     * Minor style fixes for import order and signature formatting.
    
     * Remove unused fixtures and imports from tests.
    
     * __init__.py doesn't need to export the Metadata classes.  Remove
       that and update the test files that were using it.
    felliott committed Mar 28, 2018
    Configuration menu
    Copy the full SHA
    c57b537 View commit details
    Browse the repository at this point in the history
  5. Merge branch 'feature/gcloud-updates' into develop

     Code improvements, tests, minor fixes for the limited Google Cloud
     Storage provider.
    
     [SVCS-617]
     Closes: CenterForOpenScience#327
    felliott committed Mar 28, 2018
    Configuration menu
    Copy the full SHA
    691cc97 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    7f80253 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    998f857 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    9abd9f6 View commit details
    Browse the repository at this point in the history

Commits on Apr 5, 2018

  1. Configuration menu
    Copy the full SHA
    39454ea View commit details
    Browse the repository at this point in the history
  2. Fix typing and update import for box

    - The type fix also finds a bug in our code where
      `._intra_move_copy_metadata()` calls a buggy
      `._get_folder_meta()` that further calls
      ._serialize_item() with invalid arguments.
    cslzchen committed Apr 5, 2018
    Configuration menu
    Copy the full SHA
    e183bff View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    f5c4267 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    20756c1 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    6672fb8 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    8a3218e View commit details
    Browse the repository at this point in the history

Commits on Apr 6, 2018

  1. Configuration menu
    Copy the full SHA
    95fa944 View commit details
    Browse the repository at this point in the history
  2. Update mypy: 0.560 -> 0.580

    cslzchen committed Apr 6, 2018
    Configuration menu
    Copy the full SHA
    3eee8db View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    154806f View commit details
    Browse the repository at this point in the history
  4. switch back to non-conda based rtd config

     * ReadTheDocs has updated their base python image, meaning the
       anaconda-based config is no longer needed.
    
       The conda config was out-of-date and failing to build anyway since
       the setuptools dependency version bump.
    felliott committed Apr 6, 2018
    Configuration menu
    Copy the full SHA
    23aa355 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    0df8fd6 View commit details
    Browse the repository at this point in the history

Commits on Apr 10, 2018

  1. don't send logging callbacks for partial requests

     * Stop sending download callbacks for 206 Partial responses.  These
       should not be counted as full downloads.  WB does not directly
       support Range requests on direct-from-provider downloads (signed
       urls), but at least curl and Postman appear to propagate Range
       headers from the original request to the follow-up redirection
       request.  For now, log 302 responses with Range headers, but
       continue to send download callbacks as normal.  The logs will be
       used to determine the correct behavior in the future.
    felliott committed Apr 10, 2018
    Configuration menu
    Copy the full SHA
    df75986 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    539672c View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    75ba817 View commit details
    Browse the repository at this point in the history
  4. Merge branch 'hotfix/0.38.1'

    felliott committed Apr 10, 2018
    Configuration menu
    Copy the full SHA
    11b6f54 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    e6a82e8 View commit details
    Browse the repository at this point in the history

Commits on Apr 12, 2018

  1. Configuration menu
    Copy the full SHA
    e72a69e View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    1a44c49 View commit details
    Browse the repository at this point in the history

Commits on Apr 13, 2018

  1. don't log revisions metadata requests to callback

     * Turn off revisions metadata logging.  When file download logging
       was added regular metadata requests were excluded, but revisions
       were overlooked.
    Johnetordoff authored and felliott committed Apr 13, 2018
    Configuration menu
    Copy the full SHA
    3c5638a View commit details
    Browse the repository at this point in the history
  2. return metadata about request in logging callback

     * Update WB to return the request method, url, user agent, and
       referrer url in the logging callback payload.  Intended to help the
       callback listener provide more specific download metrics.
    Johnetordoff authored and felliott committed Apr 13, 2018
    Configuration menu
    Copy the full SHA
    666217d View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    81bec1a View commit details
    Browse the repository at this point in the history

Commits on Apr 20, 2018

  1. release metadata update response in osfstorage tasks

     * Otherwise an "unclosed response" error will appear when the next
       celery task is scheduled.
    felliott committed Apr 20, 2018
    Configuration menu
    Copy the full SHA
    0169a89 View commit details
    Browse the repository at this point in the history

Commits on Apr 23, 2018

  1. add post-task cleanup for osfstorage tasks

     * The osfstorage provider kicks off two tasks after upload; one to
       backup the file to Amazon Glacier and one to generate parity files
       that are sent to a bucket on the storage backend provider.  Since
       both tasks run in parallel and need a copy of the uploaded file to
       work, neither could be responsible for deleting it when done.
       Instead this work would need to be done periodically by an admin to
       keep the disk from filling up.
    
       Both tasks are now included in a Celery chord.  When all the tasks
       in a chord are done, it runs another task.  In this case, the chord
       will run a cleanup task after both other tasks finish.  To make
       this simpler, each upload is moved to a temporary directory where
       it and its generated parity files lived.  This temporary directory
       is removed by the cleanup task.
    
       The parity and archive tasks tests have been commented out rather
       than updated, since a simpler approach may be implemented soon.
    felliott committed Apr 23, 2018
    Configuration menu
    Copy the full SHA
    c67aa5c View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    7bdc4aa View commit details
    Browse the repository at this point in the history
  3. Delay URL build/sign for GoogleCloud

    - Use `functool.partial()` to delay building and
      signing URL unitl the request is actually made.
    - Now `make_request()` get a brand new URL every
      time it retries a failed request.
    cslzchen committed Apr 23, 2018
    Configuration menu
    Copy the full SHA
    d552440 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    e4e2aca View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    43235a3 View commit details
    Browse the repository at this point in the history
  6. Merge branch 'hotfix/0.38.2'

    felliott committed Apr 23, 2018
    Configuration menu
    Copy the full SHA
    19c1749 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    2395939 View commit details
    Browse the repository at this point in the history

Commits on Apr 24, 2018

  1. Configuration menu
    Copy the full SHA
    a28b0cd View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    8e5e9f4 View commit details
    Browse the repository at this point in the history
  3. release metadata update response in osfstorage tasks

     * Otherwise an "unclosed response" error will appear when the next
       celery task is scheduled.
    felliott committed Apr 24, 2018
    Configuration menu
    Copy the full SHA
    97d508a View commit details
    Browse the repository at this point in the history
  4. add post-task cleanup for osfstorage tasks

     * The osfstorage provider kicks off two tasks after upload; one to
       backup the file to Amazon Glacier and one to generate parity files
       that are sent to a bucket on the storage backend provider.  Since
       both tasks run in parallel and need a copy of the uploaded file to
       work, neither could be responsible for deleting it when done.
       Instead this work would need to be done periodically by an admin to
       keep the disk from filling up.
    
       Both tasks are now included in a Celery chord.  When all the tasks
       in a chord are done, it runs another task.  In this case, the chord
       will run a cleanup task after both other tasks finish.  To make
       this simpler, each upload is moved to a temporary directory where
       it and its generated parity files lived.  This temporary directory
       is removed by the cleanup task.
    
       The parity and archive tasks tests have been commented out rather
       than updated, since a simpler approach may be implemented soon.
    felliott committed Apr 24, 2018
    Configuration menu
    Copy the full SHA
    391bb6d View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    2886a77 View commit details
    Browse the repository at this point in the history
  6. Merge branch 'hotfix/cleanup-after-osfstorage-tasks'

     * These commits were originally merged to develop, but are being
       hotfixed into master to solve issues with unbounded storage
       consumption.
    felliott committed Apr 24, 2018
    Configuration menu
    Copy the full SHA
    882d3d8 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    5ba3767 View commit details
    Browse the repository at this point in the history
  8. Merge branch 'hotfix/0.38.3'

    felliott committed Apr 24, 2018
    Configuration menu
    Copy the full SHA
    a7eb2f7 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    ce2514c View commit details
    Browse the repository at this point in the history
  10. move url evocation inside the retry loop

     * In 0.38.2, the googlecloud provider was updated to provide a
       function as the url parameter to `BaseProvider.make_request`.
       Since googlecloud urls are signed and can expire, it must delay
       generation until right before issuing.  Otherwise, if the first
       request fails, the second may not get issued until after the
       signature has expired.
    
       Unfortunately, `.make_request` was evoking the url function outside
       of the retry loop.  This resulted in the same signed url being used
       for each retry.  Moving this inside the retry loop will cause new
       urls to be generated for each retry request.
    felliott committed Apr 24, 2018
    Configuration menu
    Copy the full SHA
    5959ff7 View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    25507ec View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    d61d6bc View commit details
    Browse the repository at this point in the history
  13. Merge branch 'hotfix/0.38.4'

    felliott committed Apr 24, 2018
    Configuration menu
    Copy the full SHA
    ac6f192 View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    e9fdd80 View commit details
    Browse the repository at this point in the history
  15. Configuration menu
    Copy the full SHA
    3d9d429 View commit details
    Browse the repository at this point in the history
  16. Configuration menu
    Copy the full SHA
    cb61a52 View commit details
    Browse the repository at this point in the history
  17. Configuration menu
    Copy the full SHA
    d4b0f3e View commit details
    Browse the repository at this point in the history
  18. Merge branch 'hotfix/0.38.5'

    felliott committed Apr 24, 2018
    Configuration menu
    Copy the full SHA
    cff38c6 View commit details
    Browse the repository at this point in the history
  19. Configuration menu
    Copy the full SHA
    33baec0 View commit details
    Browse the repository at this point in the history

Commits on Apr 25, 2018

  1. Configuration menu
    Copy the full SHA
    31af967 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    d968683 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    8765321 View commit details
    Browse the repository at this point in the history
  4. Merge branch 'hotfix/0.38.6'

    felliott committed Apr 25, 2018
    Configuration menu
    Copy the full SHA
    8849baa View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    e2e6266 View commit details
    Browse the repository at this point in the history

Commits on May 1, 2018

  1. Send along user ID when asking for children metadata from the osf

     [#SVCS-689]
    
     * osfstorage is being updated to include a flag in file metadata that
       will indicate if the requesting user has seen the most recent
       version of the file.  To help the OSF properly determine this,
       update WB to send along the requesting user's id when asking for
       the metadata of all files in a given directory.
    
       The requesting user is not the same as the authorizing user.  If
       Barbara asks for the contents of an osfstorage directory created by
       Alice, Barbara is the *requesting* user, while Alice is the
       *authorizing* user.  WB first verifies that Barbara has the
       necessary access to the file, but uses a shared secret to retrieve
       metadata about the file.  This change is necessary to inform the
       OSF who is behind the request.
    
     * The osfstorage folder children response is updated to include a new
       flag, `latestVersionSeen`.  If this flag is `null`, the requesting
       user has never seen *any* version of the file.  If it is `true`,
       the user has seen the latest version of the file.  If it is
       `false`, the user has seen a previous version of the file, but not
       the most recent one.  This flag will be exposed through the
       `extra.latestVersionSeen` flag in OsfStorageFileMetadata.
    
     * Due to a quirk in WB and the OSF, the latestVersionSeen flag will
       only be correctly set on the responses from folder metadata list
       requests.  Neither service correctly handles requests for
       previous-version metadata for individual files.
    
     * Update tests to include latestVersionSeen in children response.
    erinspace authored and felliott committed May 1, 2018
    Configuration menu
    Copy the full SHA
    6843bf5 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    bfbd106 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    9b40f14 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    0aa3e1f View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    07cb5b1 View commit details
    Browse the repository at this point in the history
  6. Skip parsing response body for HEAD requests

    - This only applies to `exception_from_response()`
    - Side-effect: fix not-released response
    cslzchen authored and felliott committed May 1, 2018
    Configuration menu
    Copy the full SHA
    56c99ad View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    ee86ba3 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    98b057c View commit details
    Browse the repository at this point in the history

Commits on May 8, 2018

  1. Configuration menu
    Copy the full SHA
    2be4a0e View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    001c495 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    1d6435d View commit details
    Browse the repository at this point in the history

Commits on May 23, 2018

  1. Remove extra parens in core exception tests

    They do nothing syntactically and are confusing.
    NyanHelsing authored and cslzchen committed May 23, 2018
    Configuration menu
    Copy the full SHA
    aa1fdbd View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    140ceed View commit details
    Browse the repository at this point in the history

Commits on May 24, 2018

  1. Configuration menu
    Copy the full SHA
    5ba0d35 View commit details
    Browse the repository at this point in the history
  2. Fix referrer domain calculation

    - Build referrer domain from scheme, host and port
    - Only build referrer domain if exists
    cslzchen committed May 24, 2018
    Configuration menu
    Copy the full SHA
    5052e63 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    2f45e0b View commit details
    Browse the repository at this point in the history

Commits on Jun 1, 2018

  1. Configuration menu
    Copy the full SHA
    eb5a8ac View commit details
    Browse the repository at this point in the history
  2. OSFStorage: intra move/copy only for same region

     * The googlecloud backend does not support intra move/copy if the
       buckets are in different regions.  Add the relevant check to
       can_intra_* and update tests.
    cslzchen authored and felliott committed Jun 1, 2018
    Configuration menu
    Copy the full SHA
    dd9b536 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    00949cf View commit details
    Browse the repository at this point in the history
  4. Add size-cast-as-int property to file metadata.

     * Owncloud has the unfortunate habit of returning file size as a
       string instead of an int.  WB never enforced a cast to int on the
       property, so there may be clients in the wild that expect it to be
       a string.  To avoid breaking these, add a new property, `sizeInt`
       to WB metadata responses. This is guaranteed to be either an `int`
       or `None` if the size is unknown.
    
     * The JSON-API -style responses for folder metadata include a `size`
       field that is always `None`.  Add a similar `sizeInt` field for
       parity with files.
    
     * Update explicit metadata tests for all providers.
    
     * Update type annotation for BaseFileMetadata.size to reflect its
       regrettable potential for stringiness.
    AddisonSchiller authored and felliott committed Jun 1, 2018
    Configuration menu
    Copy the full SHA
    f45e48a View commit details
    Browse the repository at this point in the history
  5. Minor logic change

    SVCS-499
    TomBaxter authored and felliott committed Jun 1, 2018
    Configuration menu
    Copy the full SHA
    2f2f8bc View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    641ccae View commit details
    Browse the repository at this point in the history

Commits on Jun 4, 2018

  1. fill out tests for v1 server API

     * Count the number of times a mock corountine has been awaited.
    
     * Expand handler tests, port them to pytest and reorganize fixtures
    Johnetordoff authored and felliott committed Jun 4, 2018
    Configuration menu
    Copy the full SHA
    15e7457 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    5c637a2 View commit details
    Browse the repository at this point in the history

Commits on Jun 5, 2018

  1. pin Dockerfile to use jessie-based python

     * The python:3.5-slim docker tag was recently repointed to a debian
       stretch-based image.  Until WB has been verified to work on
       stretch, pin to the jessie based image it has been using.
    felliott committed Jun 5, 2018
    Configuration menu
    Copy the full SHA
    61da346 View commit details
    Browse the repository at this point in the history
  2. depend on gpg; try other keyservers

     * Following the OSF's lead, explicitly depend on gnupg2 and specify
       fallback gpg keyservers.
    felliott committed Jun 5, 2018
    Configuration menu
    Copy the full SHA
    1f42af3 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    8be25cf View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    f46d301 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    2732383 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    08ddacd View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    43dc14e View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    38760a7 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    6766721 View commit details
    Browse the repository at this point in the history

Commits on Jun 7, 2018

  1. Disable intra move/copy region check for filesystem

    - Region only applies when googlecloud is the storage provider
    cslzchen committed Jun 7, 2018
    Configuration menu
    Copy the full SHA
    c545615 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    01b0733 View commit details
    Browse the repository at this point in the history

Commits on Jun 12, 2018

  1. signal MFR render/export requests to the OSF

     * MFR now includes a header when requesting metadata from WB.  This
       header indicates if the MFR request is a render or export action.
       If WB sees this header, it should relay it to the OSF by changing
       the action from 'metadata' to either 'render' or 'export'.
    
       The OSF will be updated to treat these actions as metadata
       requests and to use them to keep metrics on MFR usage.
    felliott committed Jun 12, 2018
    Configuration menu
    Copy the full SHA
    ce8ddb4 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    0e53579 View commit details
    Browse the repository at this point in the history

Commits on Jun 22, 2018

  1. Configuration menu
    Copy the full SHA
    c507c36 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    19c9461 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    782231f View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    6e5935b View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    6dc3b0d View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    2197194 View commit details
    Browse the repository at this point in the history

Commits on Jul 5, 2018

  1. Add CutoffStream class to read subset of existing stream

     * Most providers have limits on how big a file can be uploaded in a
       single request.  Some providers support uploading larger files by
       "chunking" uploads: breaking a file into multiple pieces, uploading
       them individually, then reassembling them on the providers side.
       Each provider sets its own limit on the max size of a single chunk,
       but usually they are multi-megabyte chunks.
    
       WB receives a single stream during an upload.  To chunk this
       without downloading and manually partitioning the file requires a
       stream reader class that can read up to `n` bytes, then stop
       without closing the original stream.  The WB stream classes inherit
       from `asyncio.StreamReader`, whose `readexactly(n)` and `read(n)`
       methods appear to support this use case.  They do, sort of.  They
       attempt to read all `n` bytes into a chunk in memory before sending
       it off the the provider.  This means that 1) uploading a 10 Mb
       chunk requires 10 Mb of memory, and 2) all 10 Mb must be fetched
       from the uploader before being sent.  1 could quickly lead to
       memory exhaustion in WB if multiple uploads happen at the same
       time.  2 can cause uploads to fail. If the uploader is slow to send
       data to WB and fill the chunk, the receiving provider may close the
       connection as inactive.
    
       The solution is to continuously send smaller subchunks of data to
       the provider, terminating after the overall chunk size is reached.
       This is actually how the `asyncio.StreamReader.read()` method is
       intended to function, but confusion between what `read()` calls a
       chunk size and what the provider calls a chunk size led to failures
       in cross-provider move/copies into Figshare, the only provider at
       the time of this commit that supports chunked uploads.
    
       The new CutoffStream class takes an existing stream object and the
       provider-given chunk size (superchunk) and continuously reads and
       feeds subchunks.  After each subchunk read, a bytes-thus-far
       counter is updated.  The CutoffStream stops reading when
       bytes-thus-far equals the superchunk size.  Only subchunk bytes of
       data are stored in memory at a time.
    felliott committed Jul 5, 2018
    Configuration menu
    Copy the full SHA
    cc27c32 View commit details
    Browse the repository at this point in the history
  2. update Figshare to use CutoffStream for multipart uploads

     * Fix cross-provider uploads to Figshare by using the new
       CutoffStream class.  The previous approach was buffering the entire
       chunk into memory before sending.  If the source provider was slow,
       Figshare would close the connection as inactive while waiting.
    
     * Update Figshare tests to no longer fake the stream md5sum.  This
       was incorrectly diagnosed as an issue with aiohttpretty, instead of
       the issue above.  Now that CutoffStream is being used, the Figshare
       tests can calculate an actual hash.
    felliott committed Jul 5, 2018
    Configuration menu
    Copy the full SHA
    195d47c View commit details
    Browse the repository at this point in the history
  3. gdrive: cast size to int when building ResponseStreamReader

     * When creating a download stream for Google Drive files, make sure
       to pass the size as an integer, to meet the expectations of others
       who consume the stream.
    
       Google Drive reports file size as a string instead of an integer.
       When a file is copied from GDrive to Figshare, WB initiates an
       upload by telling Figshare how big a file it should expect.
       Figshare will throw a 400 is WB passes the size as a string instead
       of an integer.
    felliott committed Jul 5, 2018
    Configuration menu
    Copy the full SHA
    faab3cb View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    58674d9 View commit details
    Browse the repository at this point in the history
  5. Add chunked uploads for S3

    - The chunked uploads consists primarily of three
      methods which (1) create a session, (2) upload
      parts of the stream and (3) close the session.
    - Add/update tests
    Johnetordoff authored and felliott committed Jul 5, 2018
    Configuration menu
    Copy the full SHA
    f1d9b90 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    8a567a3 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    e9994a7 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    59a1630 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    6f326c3 View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    967da8b View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    f3fcc30 View commit details
    Browse the repository at this point in the history
  12. Fix chunked uploads for S3

    - Fix one S3 test by disable server encrpytion
    - Use sync loop instead of async for parts uploading
    
    [skip ci] [ci skip]
    cslzchen authored and felliott committed Jul 5, 2018
    Configuration menu
    Copy the full SHA
    42c7dd1 View commit details
    Browse the repository at this point in the history

Commits on Jul 10, 2018

  1. Several minor updates/reversions

    - Use tuple extra comma for `expects=` in `make_request()`
    - Use `functools.partial()` instead of lambda express to build requests
    - Reorder multi-part load methods
    - Update docstr for multi-part load methods
    - Remove inconsistent typing, which will be added later
    cslzchen authored and felliott committed Jul 10, 2018
    Configuration menu
    Copy the full SHA
    738bde3 View commit details
    Browse the repository at this point in the history
  2. Pass upload id instead of full session to methods

    - Main change: session upload id is the only info that each
      method needs to make requests to S3. Pass the string to
      the methods instead of the full object/dictionary
    - Side effect: (1) improve returns (2) fix docstr (3) use
      "CONTIGUOUS_UPLOAD_SIZE_LIMIT"
    cslzchen authored and felliott committed Jul 10, 2018
    Configuration menu
    Copy the full SHA
    53514e3 View commit details
    Browse the repository at this point in the history
  3. Rewrite multi-part upload abort action:

    - Add max retries for the while loop
    - Use list length == 0 as the break condition
    - Instead of raise exceptions, return True if successful,
      False otherwise. Add debug and error logs respectively.
    cslzchen authored and felliott committed Jul 10, 2018
    Configuration menu
    Copy the full SHA
    5025ea8 View commit details
    Browse the repository at this point in the history
  4. Improve upload logic and fix encryption header

    - Abort upload if (1) upload parts or (2) complete upload fails
    - Add abort status to the errors and logs for both user and devs
    - Fix encryption headers
    cslzchen authored and felliott committed Jul 10, 2018
    Configuration menu
    Copy the full SHA
    f98e63d View commit details
    Browse the repository at this point in the history
  5. Four minor fixes:

    - Make `CHUNK_SIZE` and `CONTIGUOUS_UPLOAD_SIZE_LIMIT` set
      as class property from settings. This allows unit tests
      to have there own settings (with small size limits)
    - Use a default empty list for remaining uploaded part list
      to avoid using try/catch to check list length == 0
    - Remove server-side encryption from `_upload_parts` since
      (1) the action does not support the header (2) it is set
      in `_create_upload_session` where mult-part upload is
      inititated
    - Fix style for building xml payload when completing upload
    cslzchen authored and felliott committed Jul 10, 2018
    Configuration menu
    Copy the full SHA
    293ba0b View commit details
    Browse the repository at this point in the history
  6. Fix the issue where successful ABORT deletes the session

    - If the ABORT request is successful, the mult-part upload session
      may have already been deleted when the LIST PARTS request is made.
    - Update the criteria for successful abort: ether LIST PARTS request
      returns 404 or returns 200 with an empty parts list.
    cslzchen authored and felliott committed Jul 10, 2018
    Configuration menu
    Copy the full SHA
    da60939 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    9f78ec6 View commit details
    Browse the repository at this point in the history
  8. make _create_upload_session return only session id

     * Return only the specific data needed instead of a structure.
    
     * Update tests to match.
    felliott committed Jul 10, 2018
    Configuration menu
    Copy the full SHA
    ab23271 View commit details
    Browse the repository at this point in the history
  9. release response before error handling

     * Avoid triggering an unclosed response warning by closing before
       testing for the error case.  The headers are still readable after
       the response has been closed.
    felliott committed Jul 10, 2018
    Configuration menu
    Copy the full SHA
    fd426be View commit details
    Browse the repository at this point in the history
  10. use CutoffStream to segment stream into parts

     * Avoid reading CHUNK_SIZE bytes into memory by wrapping the upload
       stream with a CutoffStream.  CutoffStream allows continuous reading
       and sending of small subchunks (~10k) until CHUNK_SIZE bytes have
       been read in total.
    
     * Add a test for the case where the final superchunk is less than
       CHUNK_SIZE bytes.
    felliott committed Jul 10, 2018
    Configuration menu
    Copy the full SHA
    200632b View commit details
    Browse the repository at this point in the history

Commits on Jul 11, 2018

  1. Configuration menu
    Copy the full SHA
    9ec4aab View commit details
    Browse the repository at this point in the history

Commits on Jul 18, 2018

  1. Improve the command invoke test

    - Add an option `--provider=` to test a specific provider only
    - Add an option `--path=` to test a specific file or folder only
    - Add an option `--nocov=` to disable coverage
    cslzchen committed Jul 18, 2018
    Configuration menu
    Copy the full SHA
    25fac97 View commit details
    Browse the repository at this point in the history
  2. Merge branch 'feature/improve-inv-test-cmd' into develop

    [SVCS-NO-TICKET] Improve The Command `invoke test`
    Closes: CenterForOpenScience#353
    cslzchen committed Jul 18, 2018
    Configuration menu
    Copy the full SHA
    3984900 View commit details
    Browse the repository at this point in the history

Commits on Jul 25, 2018

  1. Configuration menu
    Copy the full SHA
    d1b8505 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    b0d717b View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    eaf1729 View commit details
    Browse the repository at this point in the history
  4. remove never-implemented geolocation code

     * Analytics fields will be left in and hardcoded to `None` to avoid
       changing the schema.
    felliott committed Jul 25, 2018
    Configuration menu
    Copy the full SHA
    4a33463 View commit details
    Browse the repository at this point in the history
  5. Merge branch 'hotfix/anonymize-keen-ip'

     [SVCS-871]
    felliott committed Jul 25, 2018
    Configuration menu
    Copy the full SHA
    9d9ef9b View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    b329c30 View commit details
    Browse the repository at this point in the history
  7. Merge branch 'hotfix/0.40.1'

    felliott committed Jul 25, 2018
    Configuration menu
    Copy the full SHA
    4ac4668 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    3627806 View commit details
    Browse the repository at this point in the history

Commits on Jul 26, 2018

  1. Configuration menu
    Copy the full SHA
    785a25f View commit details
    Browse the repository at this point in the history
  2. reformat complicated any expression

    NyanHelsing authored and felliott committed Jul 26, 2018
    Configuration menu
    Copy the full SHA
    4033bd3 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    f9cb9e2 View commit details
    Browse the repository at this point in the history
  4. Use separate methods for cleaner and simpler code

     - For chunked upload, add `upload_part()` to handle one chunk upload
       and `upload_part()` now calls `upload_part()`.
    
     - For normal upload, move the code into `contiguous_upload()`
    NyanHelsing authored and felliott committed Jul 26, 2018
    Configuration menu
    Copy the full SHA
    c2af470 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    9da695e View commit details
    Browse the repository at this point in the history
  6. Obtain chunked upload sizes from Dropbox provider settings

    - According to Dropbox API docs, files larger than 150 MB must
      use chunked upload. Chunks can be any size up to 150 MB. A
      typical size is 4 MB, which is what WB uses. The max file
      size can be uploaded is 350 GB.
    cslzchen authored and felliott committed Jul 26, 2018
    Configuration menu
    Copy the full SHA
    ddcc5a3 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    5d74f99 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    e89a750 View commit details
    Browse the repository at this point in the history
  9. update docstrings

    felliott committed Jul 26, 2018
    Configuration menu
    Copy the full SHA
    370ffb3 View commit details
    Browse the repository at this point in the history
  10. permit error if no session_id is available

     * The dropbox provider should error if it can't get a session
       identifier.  A `KeyError` should suffice until we see what an
       actual error condition looks like.
    felliott committed Jul 26, 2018
    Configuration menu
    Copy the full SHA
    edd2208 View commit details
    Browse the repository at this point in the history