Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-46776: Propagate fragments with ResourcePath.join #98

Merged
merged 4 commits into from
Oct 30, 2024
Merged

Conversation

timj
Copy link
Member

@timj timj commented Oct 18, 2024

Previously fragments were either dropped completely (if a ResourcePath was given that had a fragment) or included in the file name (if a plain string was given).

Now fragments are always propagated if the ResourcePath parsing finds a fragment. This means that if a filename has a "#" in it that "#" will now always be treated as a fragment and not part of the filename.

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes

Previously fragments were either dropped completely (if a
ResourcePath was given that had a fragment) or included in the
file name (if a plain string was given).

Now fragments are always propagated if the ResourcePath parsing
finds a fragment. This means that if a filename has a "#" in it
that "#" will now always be treated as a fragment and not part
of the filename.
Copy link

codecov bot commented Oct 18, 2024

Codecov Report

Attention: Patch coverage is 96.77419% with 1 line in your changes missing coverage. Please review.

Project coverage is 86.89%. Comparing base (8ddb459) to head (1e556d8).
Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
python/lsst/resources/http.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #98      +/-   ##
==========================================
+ Coverage   86.82%   86.89%   +0.07%     
==========================================
  Files          27       27              
  Lines        4630     4656      +26     
  Branches      563      564       +1     
==========================================
+ Hits         4020     4046      +26     
  Misses        466      466              
  Partials      144      144              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@timj timj force-pushed the tickets/DM-46776 branch from 7b0a0e0 to 1e556d8 Compare October 22, 2024 20:27
Copy link
Contributor

@dhirving dhirving left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK to me.

# We want to propagate fragments to the joined path and we rely on
# the ResourcePath parser to find these fragments for us even in plain
# strings. Must assume there are no `#` characters in filenames.
if not isinstance(path, str) or path_uri.fragment:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little sketched out by this -- it seems surprising that if you give join() a string filename it doesn't use that verbatim as the filename. It works within the specific context of how the datastore is set up, but breaks more general usage. I also wonder if there are some (probably slight) security implications here, where someone could smuggle in a ..#something or similar

It seems to me that callers could use .replace(fragment=...) if they want to add a fragment, or explicitly convert the string to a ResourcePath before calling join() if their intention is to treat it as a URI and not a path fragment.

Copy link
Member Author

@timj timj Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the difficulty here is that before this change:

>>> from lsst.resources import ResourcePath
>>> ResourcePath("abc#def.txt")._uri
ParseResult(scheme='file', netloc='', path='/Users/timj/work/lsstsw/build/resources/abc', params='', query='', fragment='def.txt')
>>> ResourcePath("").join("abc#def.txt")._uri
ParseResult(scheme='file', netloc='', path='/Users/timj/work/lsstsw/build/resources/abc%23def.txt', params='', query='', fragment='')

and with this change:

>>> ResourcePath("abc#def.txt")._uri
ParseResult(scheme='file', netloc='', path='/Users/timj/work/lsstsw/build/resources/abc', params='', query='', fragment='def.txt')
>>> ResourcePath("").join("abc#def.txt")._uri
ParseResult(scheme='file', netloc='', path='/Users/timj/work/lsstsw/build/resources/abc', params='', query='', fragment='def.txt')

You get a consistent answer. These relative file paths can come from users specifying subsets of pipelines or from relative path entries in datastore.

We are assuming in other places that a # in the file part of a schemeless URI means fragment and that no-one using this is going to be putting a # in the filename itself. The question is whether we break that assumption elsewhere or try to make that assumption consistent.

Copy link
Member Author

@timj timj Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For completeness, I think there is one other potential problem with "# in filename" at the moment.

>>> from lsst.resources import ResourcePath
>>> x = ResourcePath("a/bc#de.txt", forceAbsolute=False)
>>> y = ResourcePath("a", forceAbsolute=False)
>>> str(x)
'a/bc#de.txt'
>>> str(y.join("bc#de.txt"))
'a/bc#de.txt'

And for before and after this branch they stringify the same even though before one of those was a filename in the internal _uri representation and the other was a fragment. With this branch they are both fragments.

@timj timj merged commit d73b774 into main Oct 30, 2024
17 checks passed
@timj timj deleted the tickets/DM-46776 branch October 30, 2024 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants