Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-45978: Add webdav support to ResourcePath.to_fsspec #94

Merged
merged 6 commits into from
Oct 1, 2024

Conversation

airnandez
Copy link
Contributor

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes

Copy link

codecov bot commented Sep 26, 2024

Codecov Report

Attention: Patch coverage is 76.49123% with 67 lines in your changes missing coverage. Please review.

Project coverage is 86.18%. Comparing base (0f26220) to head (729aa57).
Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
python/lsst/resources/http.py 67.44% 51 Missing and 5 partials ⚠️
tests/test_http.py 90.26% 7 Missing and 4 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #94      +/-   ##
==========================================
- Coverage   87.01%   86.18%   -0.84%     
==========================================
  Files          27       27              
  Lines        4460     4646     +186     
  Branches      905      956      +51     
==========================================
+ Hits         3881     4004     +123     
- Misses        431      487      +56     
- Partials      148      155       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

from urllib.parse import parse_qs

import aiohttp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will likely need an RFC to formally add it to rubinenv conda environment. I think it comes in through webdav4 at the moment so things will work. Also need to list it in requirements.txt and pyproject.toml (the latter for the http option) -- or else put protections around the import if this is only used with fsspec.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(regardless of import protections we still formally need rubinenv to list it since we want to ensure that this code works in production).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think aiohttp is already in the environment, since I could find it. In fact, fsspec is built on top of it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$ source /cvmfs/sw.lsst.eu/almalinux-x86_64/lsst_distrib/w_2024_38/loadLSST.sh

$ command -v python
/cvmfs/sw.lsst.eu/almalinux-x86_64/lsst_distrib/w_2024_38/conda/envs/lsst-scipipe-9.0.0-exact/bin/python

$ python -c "import aiohttp"

$ echo $?
0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be precise, I am not using webdav4. I preferred not to since after 4 weeks I got no response to an issue I submitted to them, so I don't want to depend on that package (see skshetry/webdav4#196)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Fabio -- I don't think the RFC is needed. We only need aiohttp if we have fsspec, and having fsspec implies aiohttp. The interface Fabio is using requires him to pass an aiohttp client to fsspec, so if fsspec stops using aiohttp internally this is already going to break.

However, I think this import should be part of the try block for fsspec, or lazy-import it in the function below where it is used. We have some services that install this library via pip. fsspec isn't listed as a package dependency so we don't get it by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Done: import guarded by same try block used for fsspec

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm trying to say is that if we explicitly need aiohttp for this code to work the way we want (ie fsspec working on webdav in the production environment) then we need to make sure that rubin-env comes with aiohttp even if internally fsspec drops that dependency.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tim -- this code is guaranteed to not work if fsspec drops that dependency in any case. The fsspec constructor signature we are calling is:

HTTPFileSystem(get_client: aiohttp.ClientSession)

So if fsspec stops using aiohttp internally, it's going to break no matter what we do with the conda environment.

Copy link
Contributor

@dhirving dhirving left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me other than that it breaks to_fsspec for non-webdav HTTP URLs. Should be an easy fix though.

Really good to know that dcache and xrootd support a way to generate pre-signed URLs. That means we can use them as a backend for the Butler server, which might be useful at some point.

ssl_context = ssl.create_default_context()
if self._config.ca_bundle is not None:
ssl_context.load_verify_locations(capath=self._config.ca_bundle)
return aiohttp.ClientSession(connector=aiohttp.TCPConnector(ssl=ssl_context), **kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the TCPConnector should be cached to allow connection pooling, similar to what is done with the requests Session objects.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might also be good to get the call to load_verify_locations out of the async function, since it does blocking I/O. Probably fast enough it's not a huge deal though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Done: TCPconnector cached and load_verify_locations called out of the async function.

# Retrieve a signed URL for a duration of 1 hour. We assume the URL
# is to be used only for downloading, to avoid the risk the caller
# modify its contents inadvertently.
url = self.generate_presigned_get_url(expiration_time_seconds=3_600)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will break to_fsspec for non-webdav HTTP/HTTPS URLs, since fetching a macaroon will fail. Previously it was inheriting the base implementation from ResourcePath.to_fsspec, which just passes the URL unchanged to fsspec.url_to_fs.

Maybe add a conditional at the top delegating to the base class for non-webdav URLs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Done: conditional added. ResourcePath.to_fsspec called for plain HTTP paths.

except json.JSONDecodeError:
raise ValueError(f"could not deserialize response to POST request for URL {self}")
except Exception as e:
raise e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it just re-raises, the behavior would be the same here without the explicit except Exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Done

from urllib.parse import parse_qs

import aiohttp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Fabio -- I don't think the RFC is needed. We only need aiohttp if we have fsspec, and having fsspec implies aiohttp. The interface Fabio is using requires him to pass an aiohttp client to fsspec, so if fsspec stops using aiohttp internally this is already going to break.

However, I think this import should be part of the try block for fsspec, or lazy-import it in the function below where it is used. We have some services that install this library via pip. fsspec isn't listed as a package dependency so we don't get it by default.

case ActivityCaveat.UPLOAD:
activity_caveat = "UPLOAD,LIST"

# Request a macaroon for the requested activities and duration duration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be helpful to include a link to some documentation explaining this macaroon stuff...

I eventually found https://www.dcache.org/manuals/workshop-2017-05-29-Umea/000-Final/anupam_macaroons_v02.pdf which helped me figure out what was going on but it took a minute.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Done: pointer to official documentation added.

- cache the aiohttp.TCPConnector to pool network connections by all
  instances of class HttpResourcePath.
- add reference documentation to dCache macaroons.
- add specific test for HttpResourcePath.to_fsspec().
- add several type annotations and docstrings.
@airnandez airnandez merged commit 754fa0f into main Oct 1, 2024
14 of 17 checks passed
@airnandez airnandez deleted the tickets/DM-45978 branch October 1, 2024 09:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants