Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache HTTP requests (GSI-1235) #107

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open

Conversation

TheByronHimes
Copy link
Member

@TheByronHimes TheByronHimes commented Dec 20, 2024

Don't release this until both the DCS and the WPS sends back caching headers (e.g. "cache-control: max-age=30")

What's New

Download Path now has HTTP caching (and garbage collection, courtesy of @mephenor). The result is that the Connector won't send loads of requests to the WPS or DCS, but rather cache the responses in memory.

  • async_client now uses hishel's in-memory cache transport, which wraps any regular httpx.AsyncBaseTransport
    • The cache transport is created in a new function, creatively named get_cache_transport. This facilitates testing.
  • The most recent PR updated the Downloader to only get the S3 download url and WOT once, but this PR reinstates the per-file-part calls to WPS and DCS with the caveat that nearly all calls will leverage the cached responses until just before they expire. At that point, a new call will be made for a fresh WOT and URL. The overlap time is governed by the constant CACHE_MIN_FRESH, set to 3 seconds.

Test Changes

Mock API:

  • The MockRouter from ghga-service-commons is no longer used, and pytest-httpx is used much less. The new mock api uses a FastAPI app that is linked to the Connector's async_client via a patched transport.
  • Mock API endpoint references to requests and responses now use FastAPI versions of those classes instead of httpx.
  • Exception handler is gone, I don't think it's needed (not 100% sure)
  • The DCS drs object endpoint used to return the s3 download url stored in the env vars, which was basically hardcoded in the test function. Now, the endpoint calls a patchable function that provides the S3 download url. This way, we can mimic how the DCS behaves in reality (kind of).

CLI Integration Tests:

  • Refactored a lot of the common test setup into fixtures which remain in the module for specificity.
  • Split up some of the tests that were overly complex, e.g. test_download.
  • set_presigned_url_update_endpoint is used to make sure S3 download urls are dynamically created

Other:

  • Upload path hasn't been looked at except to remove a module that was seemingly duplicated.
  • Security of caching WOTs? Alternatively could manually calculate token freshness client-side.
  • Tests could be more comprehensive, granular, and refactored in another PR.

@coveralls
Copy link

coveralls commented Dec 20, 2024

Pull Request Test Coverage Report for Build 12654407872

Details

  • 30 of 33 (90.91%) changed or added relevant lines in 6 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+8.9%) to 80.339%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/ghga_connector/core/downloading/downloader.py 11 14 78.57%
Totals Coverage Status
Change from base Build 12347905859: 8.9%
Covered Lines: 1042
Relevant Lines: 1297

💛 - Coveralls

@TheByronHimes TheByronHimes marked this pull request as ready for review January 7, 2025 11:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants