-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-46776: Propagate fragments with ResourcePath.join #98
Conversation
Previously fragments were either dropped completely (if a ResourcePath was given that had a fragment) or included in the file name (if a plain string was given). Now fragments are always propagated if the ResourcePath parsing finds a fragment. This means that if a filename has a "#" in it that "#" will now always be treated as a fragment and not part of the filename.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #98 +/- ##
==========================================
+ Coverage 86.82% 86.89% +0.07%
==========================================
Files 27 27
Lines 4630 4656 +26
Branches 563 564 +1
==========================================
+ Hits 4020 4046 +26
Misses 466 466
Partials 144 144 ☔ View full report in Codecov by Sentry. |
5744d58
to
7b0a0e0
Compare
The fragment can be encoded but some users need the unquoted form.
7b0a0e0
to
1e556d8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks OK to me.
# We want to propagate fragments to the joined path and we rely on | ||
# the ResourcePath parser to find these fragments for us even in plain | ||
# strings. Must assume there are no `#` characters in filenames. | ||
if not isinstance(path, str) or path_uri.fragment: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little sketched out by this -- it seems surprising that if you give join()
a string filename it doesn't use that verbatim as the filename. It works within the specific context of how the datastore is set up, but breaks more general usage. I also wonder if there are some (probably slight) security implications here, where someone could smuggle in a ..#something
or similar
It seems to me that callers could use .replace(fragment=...)
if they want to add a fragment, or explicitly convert the string to a ResourcePath
before calling join()
if their intention is to treat it as a URI and not a path fragment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the difficulty here is that before this change:
>>> from lsst.resources import ResourcePath
>>> ResourcePath("abc#def.txt")._uri
ParseResult(scheme='file', netloc='', path='/Users/timj/work/lsstsw/build/resources/abc', params='', query='', fragment='def.txt')
>>> ResourcePath("").join("abc#def.txt")._uri
ParseResult(scheme='file', netloc='', path='/Users/timj/work/lsstsw/build/resources/abc%23def.txt', params='', query='', fragment='')
and with this change:
>>> ResourcePath("abc#def.txt")._uri
ParseResult(scheme='file', netloc='', path='/Users/timj/work/lsstsw/build/resources/abc', params='', query='', fragment='def.txt')
>>> ResourcePath("").join("abc#def.txt")._uri
ParseResult(scheme='file', netloc='', path='/Users/timj/work/lsstsw/build/resources/abc', params='', query='', fragment='def.txt')
You get a consistent answer. These relative file paths can come from users specifying subsets of pipelines or from relative path entries in datastore.
We are assuming in other places that a #
in the file part of a schemeless URI means fragment and that no-one using this is going to be putting a #
in the filename itself. The question is whether we break that assumption elsewhere or try to make that assumption consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For completeness, I think there is one other potential problem with "# in filename" at the moment.
>>> from lsst.resources import ResourcePath
>>> x = ResourcePath("a/bc#de.txt", forceAbsolute=False)
>>> y = ResourcePath("a", forceAbsolute=False)
>>> str(x)
'a/bc#de.txt'
>>> str(y.join("bc#de.txt"))
'a/bc#de.txt'
And for before and after this branch they stringify the same even though before one of those was a filename in the internal _uri representation and the other was a fragment. With this branch they are both fragments.
Previously fragments were either dropped completely (if a ResourcePath was given that had a fragment) or included in the file name (if a plain string was given).
Now fragments are always propagated if the ResourcePath parsing finds a fragment. This means that if a filename has a "#" in it that "#" will now always be treated as a fragment and not part of the filename.
Checklist
doc/changes