-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-45978: Add webdav support to ResourcePath.to_fsspec #94
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #94 +/- ##
==========================================
- Coverage 87.01% 86.18% -0.84%
==========================================
Files 27 27
Lines 4460 4646 +186
Branches 905 956 +51
==========================================
+ Hits 3881 4004 +123
- Misses 431 487 +56
- Partials 148 155 +7 ☔ View full report in Codecov by Sentry. |
python/lsst/resources/http.py
Outdated
from urllib.parse import parse_qs | ||
|
||
import aiohttp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will likely need an RFC to formally add it to rubinenv conda environment. I think it comes in through webdav4 at the moment so things will work. Also need to list it in requirements.txt and pyproject.toml (the latter for the http option) -- or else put protections around the import if this is only used with fsspec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(regardless of import protections we still formally need rubinenv to list it since we want to ensure that this code works in production).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think aiohttp
is already in the environment, since I could find it. In fact, fsspec
is built on top of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$ source /cvmfs/sw.lsst.eu/almalinux-x86_64/lsst_distrib/w_2024_38/loadLSST.sh
$ command -v python
/cvmfs/sw.lsst.eu/almalinux-x86_64/lsst_distrib/w_2024_38/conda/envs/lsst-scipipe-9.0.0-exact/bin/python
$ python -c "import aiohttp"
$ echo $?
0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be precise, I am not using webdav4
. I preferred not to since after 4 weeks I got no response to an issue I submitted to them, so I don't want to depend on that package (see skshetry/webdav4#196)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Fabio -- I don't think the RFC is needed. We only need aiohttp
if we have fsspec
, and having fsspec
implies aiohttp
. The interface Fabio is using requires him to pass an aiohttp
client to fsspec
, so if fsspec
stops using aiohttp
internally this is already going to break.
However, I think this import should be part of the try
block for fsspec
, or lazy-import it in the function below where it is used. We have some services that install this library via pip
. fsspec
isn't listed as a package dependency so we don't get it by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ Done: import guarded by same try
block used for fsspec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I'm trying to say is that if we explicitly need aiohttp for this code to work the way we want (ie fsspec working on webdav in the production environment) then we need to make sure that rubin-env comes with aiohttp even if internally fsspec drops that dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tim -- this code is guaranteed to not work if fsspec drops that dependency in any case. The fsspec constructor signature we are calling is:
HTTPFileSystem(get_client: aiohttp.ClientSession)
So if fsspec stops using aiohttp internally, it's going to break no matter what we do with the conda environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me other than that it breaks to_fsspec
for non-webdav HTTP URLs. Should be an easy fix though.
Really good to know that dcache
and xrootd
support a way to generate pre-signed URLs. That means we can use them as a backend for the Butler server, which might be useful at some point.
python/lsst/resources/http.py
Outdated
ssl_context = ssl.create_default_context() | ||
if self._config.ca_bundle is not None: | ||
ssl_context.load_verify_locations(capath=self._config.ca_bundle) | ||
return aiohttp.ClientSession(connector=aiohttp.TCPConnector(ssl=ssl_context), **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the TCPConnector
should be cached to allow connection pooling, similar to what is done with the requests
Session
objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might also be good to get the call to load_verify_locations
out of the async function, since it does blocking I/O. Probably fast enough it's not a huge deal though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ Done: TCPconnector
cached and load_verify_locations
called out of the async function.
# Retrieve a signed URL for a duration of 1 hour. We assume the URL | ||
# is to be used only for downloading, to avoid the risk the caller | ||
# modify its contents inadvertently. | ||
url = self.generate_presigned_get_url(expiration_time_seconds=3_600) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will break to_fsspec
for non-webdav HTTP/HTTPS URLs, since fetching a macaroon will fail. Previously it was inheriting the base implementation from ResourcePath.to_fsspec
, which just passes the URL unchanged to fsspec.url_to_fs
.
Maybe add a conditional at the top delegating to the base class for non-webdav URLs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ Done: conditional added. ResourcePath.to_fsspec
called for plain HTTP paths.
python/lsst/resources/http.py
Outdated
except json.JSONDecodeError: | ||
raise ValueError(f"could not deserialize response to POST request for URL {self}") | ||
except Exception as e: | ||
raise e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it just re-raises, the behavior would be the same here without the explicit except Exception
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ Done
python/lsst/resources/http.py
Outdated
from urllib.parse import parse_qs | ||
|
||
import aiohttp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Fabio -- I don't think the RFC is needed. We only need aiohttp
if we have fsspec
, and having fsspec
implies aiohttp
. The interface Fabio is using requires him to pass an aiohttp
client to fsspec
, so if fsspec
stops using aiohttp
internally this is already going to break.
However, I think this import should be part of the try
block for fsspec
, or lazy-import it in the function below where it is used. We have some services that install this library via pip
. fsspec
isn't listed as a package dependency so we don't get it by default.
python/lsst/resources/http.py
Outdated
case ActivityCaveat.UPLOAD: | ||
activity_caveat = "UPLOAD,LIST" | ||
|
||
# Request a macaroon for the requested activities and duration duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be helpful to include a link to some documentation explaining this macaroon stuff...
I eventually found https://www.dcache.org/manuals/workshop-2017-05-29-Umea/000-Final/anupam_macaroons_v02.pdf which helped me figure out what was going on but it took a minute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ Done: pointer to official documentation added.
…in class HttpResourcePathConfig
…o implement to_fsspec
…g a single OPTIONS request
…ow to handle redirection on upload
- cache the aiohttp.TCPConnector to pool network connections by all instances of class HttpResourcePath. - add reference documentation to dCache macaroons. - add specific test for HttpResourcePath.to_fsspec(). - add several type annotations and docstrings.
3aee8d8
to
729aa57
Compare
Checklist
doc/changes