Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling URIs with a path component ending with the same segment term #107

Closed
csarven opened this issue Nov 1, 2019 · 27 comments
Closed

Comments

@csarven
Copy link
Member

csarven commented Nov 1, 2019

Can there be more than one URI with a path component ending with the same segment term? No.

Path segments separated by slash (/) entail hierarchical containment. Path segments:

  • ending with a slash corresponds to a LDPC;
  • not ending with a slash corresponds to a LDPR that's not a LDPC.

Examples:

/foo/ exists =>

  • /foo cannot exist.

/foo/bar/baz exists =>

  • /foo/, /foo/bar/ exists.
  • /foo, /foo/bar, /foo/bar/baz/, /foo/bar/baz/qux,.. cannot exist.

Normally 404 status code is used when there is no representation for a resource. When a segment term is used as part of the last path segment of a resource, should there instead be a redirect (301) to it when the same segment term is used for a resource without a representation? For example:

  • If /foo/ exists, should request to /foo redirect to /foo/?
  • If /foo exists, should request to /foo/ redirect to /foo ?
@acoburn
Copy link
Member

acoburn commented Nov 1, 2019

/foo/ exists =>

/foo cannot exist.

Is there a particular reason for this? Developer/implementation simplicity?

Is it therefore not possible to handle the alternative case, where, for example, a trailing slash is not present? That is,

/foo/bar exists and /foo is a container. (/foo/ would redirect to /foo)

@csarven
Copy link
Member Author

csarven commented Nov 1, 2019

@acoburn Re "cannot exist" was in context of attempting to create /foo when /foo/ already exists, and vice-versa.

It is an open question as to what happens when one exists and an attempt is made to resolve (GET) the other eg:

If /foo/ exists, should request to /foo redirect to /foo/?
If /foo exists, should request to /foo/ redirect to /foo ?

AFAIK, some HTTP servers do the former but I haven't seen the latter. Should both redirect? Only one? Neither? Why?

@acoburn
Copy link
Member

acoburn commented Nov 1, 2019

Two points, both of which support the proposal outlined by @csarven above:

  1. The "LDP best practices" document explicitly recommends having slashes at the end of container URIs
  2. While this would involve a little bit of code on the Trellis side to make work (currently all resources with a final slash are redirected to a resource without a final slash in the URL), it won't be much work at all (I think).

It is an open question as to what happens when one exists and an attempt is made to resolve (GET) the other eg:

If /foo/ exists, should request to /foo redirect to /foo/?
If /foo exists, should request to /foo/ redirect to /foo ?

In the Trellis implementation, the answer is (currently) "no" to the first and "yes" to the second. But as part of implementing support for this proposal, I would be aiming for "yes" as the answer for each case.

@TallTed
Copy link
Contributor

TallTed commented Nov 1, 2019

If /foo/ exists, should request to /foo redirect to /foo/?
If /foo exists, should request to /foo/ redirect to /foo?

Given that 404 is to be returned when ACL does not permit access, I think the 404 should be returned before (i.e., instead of) the redirect in these cases, as doing otherwise exposes the existence information the 404 is meant to protect.

@csarven
Copy link
Member Author

csarven commented Nov 1, 2019

Allow me to further clarify what I've intended with the independent scenarios:

If /foo/ exists, should request to /foo redirect to /foo/?

/foo/ was created as a LDPC, then /foo is requested.

If /foo exists, should request to /foo/ redirect to /foo?

/foo was created as a LDPR that's not an LDPC, then /foo/ is requested.

Re:

The "LDP best practices" document explicitly recommends having slashes at the end of container URIs

Right, that'd be fine for the first example /foo/. It doesn't concern the second example.

In the Trellis implementation, the answer is (currently) "no" to the first and "yes" to the second.

I find that interesting. How did that come about? I've seen 1. yes, 2. no (eg. Apache).

But as part of implementing support for this proposal, I would be aiming for "yes" as the answer for each case.

I like the consistency. 1. helps to add missing slash - moving foward, whereas 2. removes - moving backward.

@csarven
Copy link
Member Author

csarven commented Nov 1, 2019

Given that 404 is to be returned when ACL does not permit access, I think the 404 should be returned before (i.e., instead of) the redirect in these cases, as doing otherwise exposes the existence information the 404 is meant to protect.

That's a good point.

(I arrived at 404 basically because there is no matching resource.)

@mikeadams1
Copy link

Not sure if this matters but, in the data browser both /foo/ and /foo exist. /foo gives you a view of the file but when you try to access it, it throws a 500, /foo/ allows access to the files if they are public.

@kjetilk
Copy link
Member

kjetilk commented Nov 27, 2019

I have nothing much to add, as I think the discussion is good, I just want to poke the participants: Are we able to formulate a rough consensus here?

@kjetilk
Copy link
Member

kjetilk commented Dec 18, 2019

@timbl mentioned in the F2F that he expected a 301 on

If /foo/ exists, should request to /foo redirect to /foo/?

and indeed, there is a lot of precedent to support that. @TallTed's point is also a good catch, we should have that.

I have no strong opinions on the other case, but I note @acoburn 's statement that it is in Trellis.

Do we have a rough consensus that this is how we do it?

@csarven
Copy link
Member Author

csarven commented Dec 19, 2019

Rough consensus capturing unique information:

  • Server MUST NOT have a URI with a path component ending with a segment term that is already in use by another URI.
  • Server MAY redirect a request with 301 from a non-existing URI to an existing URI with a matching path component ending with the same segment term. Behaviour pertaining to authorization MUST proceed this optional redirect.

Note: we agree on 404 for privacy reasons ie. not redirecting if not authorized, but I suggest that should be first addressed generally by resolving #14 . It is still useful to mention expected behaviour in the redirect case with that in mind.

@kjetilk
Copy link
Member

kjetilk commented Dec 19, 2019

Nitpicking/devils advocacy:

Server MUST NOT have a URI with a path component ending with a segment term that is already in use by another URI.

Wouldn't that disallow /foo/baz if /foo/bar/baz exist?

(I mean, I get your point, and agree with it, it is just that people could actually write code to conform, and be surprised... :-) )

Server MAY redirect a request with 301 from a non-existing URI to an existing URI with a matching path component ending with the same segment term.

SHOULD...? I think that expectation is pretty strong... :-)

@csarven
Copy link
Member Author

csarven commented Dec 19, 2019

Yes, just meant to capture the consensus but not exactly what goes into the spec.

This criteria is actually similar to or a variation on the SHOULD NOT re-use URI with a different identity for the resource in that it prohibits creating a URI with the same authority and path component differing only in the trailing slash. How about:

Server MUST NOT create a URI differing only in the trailing slash with that of an existing URI.

If for example /foo/bar/baz exists, then /foo/, /foo/bar/ exists. Attempting to create /foo/bar or /foo/bar/baz/qux (which attempts to create /foo/, /foo/bar/, /foo/bar/baz/) will fail because of existing /foo/bar/ and /foo/bar/baz.

Re redirect, no slash to slash is a common behaviour, but I'm not sure if slash to no slash is common. I figured MAY would keep it relaxed (and return 404), but I have no objection to using SHOULD for the redirect.

@kjetilk
Copy link
Member

kjetilk commented Dec 19, 2019

👍 to the new formulation!

I think we can keep the SHOULD vs MAY open until the drafting phase. With that, I suppose we can move this to rough consensus?

@elf-pavlik
Copy link
Member

Thinking about 'website' use cases, related to which I posted my recent comment in #69 (comment)

In that use case we may find common paths like /team and /team/alice. As I understand rough consensus of this issue, such website would need to use /team/ (with trailing slash which could get removed using History API) and /team/alice. It brings up again requirement of publishing HTML content denoted by URI of LDP container (already discussed in #69).

@csarven
Copy link
Member Author

csarven commented Dec 20, 2019

The rough consensus of this issue is that if /team exists, then /team/ can't be created. If /team/ exists, then /team can't be created. That is all. *nix based filesystems have a similar behaviour. This issue has to do with the Identification component (AWWW). Data formats and History API are on the wrong layer to discuss this issue - they have no bearing.

@elf-pavlik
Copy link
Member

The rough consensus of this issue is that if /team exists, then /team/ can't be created. If /team/ exists, then /team can't be created. That is all. *nix based filesystems have a similar behavior.

Yes, this makes sense. All the issues related to html websites and vanity urls can get addressed separately.

@TallTed
Copy link
Contributor

TallTed commented Dec 20, 2019

If /team/ exists, then /team can't be created

... but team.html, team.ttl, team.any could be! And (with a little word order changing for clarity) --

/team/ can't be created if /team exists

... but /team/ can be created if team.html, team.ttl, team.any exist! The challenges only come in when the filename extension is dropped/ignored.


I think a lot of the current challenge is that we're trying to combine the filesystem functionality and the webserver functionality into one tool -- and also saying that all functionality must be available at all times.

Apache, nginx, etc., are configurable to redirect from (or deliver without actual redirection) / to /index.html or /index.php or /arbitrary.handler or whatever. (It's not just about index.*!) This is neither default, nor universal, behavior -- it is the choice of the server (or website, or directory) admin -- but the discussion I'm seeing seems to be trying to make the decision once and for all, and appears to be largely debating between people who would make different choices on their apache/nginx/whatever instance ... which would be OK if it's actually turned into a configuration option, but will remain a problem if we make it hard-and-fast as the way Solid works.

@csarven
Copy link
Member Author

csarven commented Dec 20, 2019

Ted, #109

@csarven
Copy link
Member Author

csarven commented Dec 20, 2019

#109 (comment) suggested that "there is no need to specify index.* as a special case". Are we now in agreement?

@TallTed
Copy link
Contributor

TallTed commented Dec 20, 2019

I should have said in my last here (which I've just edited to include this) -- "to /index.html or /index.php or /arbitrary.handler or whatever" because it's not just about index.*.

I don't think we are in agreement.

I don't think you're fully understanding my comments, and part of my relative quiet the past few days has been in an effort to let my concerns gel into a better presentation (which has not yet happened).

I may not be fully understanding your comments, either, try as I might.

I think there are several issues (including at least #109, #105, #119, #69, and #107, and maybe more) which are fairly inextricably and nearly indecipherably crosslinked. At the moment, I am not sure whether they can be more clearly separated, or would be better unified.

@kjetilk
Copy link
Member

kjetilk commented Dec 20, 2019

I think there are several issues (including at least #109, #105, #119, #69, and #107, and maybe more) which are fairly inextricably and nearly indecipherably crosslinked. At the moment, I am not sure whether they can be more clearly separated, or would be better unified.

This suggests to me that we should start drafting spec text, and work from there instead, it would make it more concrete, and may thus be a better discussion item. OK?

@TallTed
Copy link
Contributor

TallTed commented Dec 20, 2019

This suggests to me that we should start drafting spec text, and work from there instead

Perhaps.

First question in that direction is whether the idea is to describe current behavior of NSS (or Inrupt's successor) or other existing implementation, or to describe desired/intended behavior of future implementation (including updated versions of those existing implementations).

Second question is whether the Solid spec needs to restate everything from first principles -- or is (or can be, or should be) based on (and thus inherit) existing specs, particularly concerning HTTP and/or LDP server behaviors as defined in those specs.

My understanding was that Solid was based on both HTTP and LDP, but it's clear that this understanding is not entirely shared by others, so some of us seem to have been mistaken. It's also clear that current interpretation of those specs varies, and those variances need to be harmonized somehow.

@kjetilk
Copy link
Member

kjetilk commented Dec 20, 2019

Right, I see the point, @TallTed. The spec work is not intended to be a departure from Solid as currently working, and NSS in many ways encode to understanding that was built around Solid in the years before the current crowd started working on. So, even though it can't be the source of sustained development because it is internally far from good enough, on the surface, it very nearly defines Solid, the main exception being in authentication, not in resource access.

I started out believing that Solid was essentially an application on the top of LDP, but through this fall we've had some discussions with Tim, and I have understood that while LDP is a starting point, the vision is much closer to a webization of the UNIX file system, but it borrows much from LDP, in particular the containment mechanism, which is used to model directories. That said, I still think that it is an important design goal to make Solid trivially implementable on the top of existing LDP servers. There are some balances to strike, but I think we're in a pretty good position.

@csarven
Copy link
Member Author

csarven commented Dec 26, 2019

It's not just about index.*!

No one claimed it was strictly about index.*. #109 literally exists because it acknowledges the general case for #69 . And in #107 (comment) I'm literally pointing at you to the comment that concludes (or at the very least proposes) that "index.*" need not be treated as a special case. I fail to understand what you disagree with other than the fact that you're telling me that you disagree meanwhile saying the exact same thing. I'll refrain from further judgement on that - perhaps we can better resolve this point in chat or spec call.

This issue was generally intended to focus on resource URIs although generally excluded URLs. That was because there was some agreement on using the resource URI (as opposed to representation URL) when interacting with the primary resource. I'm happy to factor in representation URLs if we are indeed not excluding writing representation URLs. This is what you seem to be highlighting as the issue with the statement that we arrived through consensus. Are we closer to being on the same page?

@kjetilk
Copy link
Member

kjetilk commented Dec 27, 2019

Is the problem that if you have a /team/ container, and a /team.ttl resource, then you would commonly, dereference the /team.ttl by using the URL /team, in which case we'd have a URI collision?

I never thought about allowing /team.ttl to be derefenced as anything but that, but I see that in the general case with conneg, that's what you do. So, do we need text to ensure that we avoid URI collisions in this case?

@elf-pavlik
Copy link
Member

How does ResourceMapper work in NSS? would it allow to have both /team/ and /team by mapping the later in underlying filesystem to something like team$.ttl or team$.jsonld?

@csarven
Copy link
Member Author

csarven commented Jan 7, 2020

Is the problem that if you have a /team/ container, and a /team.ttl resource, then you would commonly, dereference the /team.ttl by using the URL /team, in which case we'd have a URI collision?

If /team/ exists:

  • /team.ttl can be created as a regular resource (as opposed to a representation of /team, so right re "anything but that").
  • an attempt to create /team should be rejected (and so the implementation detail with /team.ttl as one of its representations would not come to exist.)

There is no URI collision as I see it with that case but there may be other scenarios?

@Mitzi-Laszlo Mitzi-Laszlo modified the milestones: December 19th, February 19th Jan 14, 2020
@csarven csarven modified the milestones: February 19th, ~First Public Working Draft Jan 24, 2020
@csarven csarven closed this as completed Aug 25, 2020
@csarven csarven moved this to Done in Specification Sep 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

7 participants