Skip to content

Commit

Permalink
Implemented :imports and Trig response. See Solid auth issue 210
Browse files Browse the repository at this point in the history
  • Loading branch information
bblfish committed Jun 11, 2021
1 parent df84bee commit 7771d58
Show file tree
Hide file tree
Showing 19 changed files with 996 additions and 312 deletions.
124 changes: 124 additions & 0 deletions notes/ContainerManagedResources.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@

Every `ldp:Resource` has at least one managed resource: the acl for that resource. So an `ldp:Container` and each of its `ldp:contains` resources has an associated acl.

### Difference between Client and Server Managed Resources

The ACR associated with a container acts like an ldp:Resource, not like an ldp:Container: so there the difference is clearly big enough that the actors implementing them need to be distinguished.

What are the differences between an ACR and a normal RDF Resource, a.k.a. an `ldp:RDFSource`?
* ACRs are not ldp:Resources (so no `Link: <...ldp#Resource>; rel="type"` header)
(on the other hand there may be a special `Link: <.acl>; rel="acl"` header linking back to itself.)
* Life cycle:
* ACRs are not created with a POST on a container. They are created when another resource gets created.
(Could this be modelled as a POST on the resource thought of as a special type of SM LDPC? -- but beware confusion with POST as append)
* DELETEing the resource created with a POST will delete the associated SM ACR too,
* There is no special archive for the acl, it goes into an archive with all the other content
* ACL's need only allow a restricted set of RDF content, as they are so essential to the functioning of the system.
Such restrictions could be imposed generically by ShEx or ShACLs, in which case this difference could be part of a generic feature set
* An ACR should be able to return an NQuads representation containing all the `:imports` as per [issue 210](https://github.com/solid/authorization-panel/issues/210#issuecomment-838747077).
This means that it needs some extra behavior in addition to a simple resource actor for GET requests.

Finally: should the ACR Actor play the role of the Guard? That would constitute a reasonably serious difference. On the other hand the Guard is perhaps just a Script that transforms a WannaDo into a Do.

### Is a reduction possible?

Two Questions:
1. Can one extract a core resource behavior?
2. Could one build all behavior on top of this using scripts?

* For POST:
Could one see the creation of SMR's (Server Managed Resources) as the server (via Scripts) doing a POST on the created resource (seen as a directory (hence the `.`)) to create those resources. (only ACLs for the moment)
* For DELETE:
Only the server is allowed to use that method on SMR.
This still leaves the difference in how archivage is treated.
But perhaps that is because we used a directory archiving strategy.
We could also add an attribute "Deleted" to each resource and that be enough to make it count as deleted!
Then there would be no difference here - the general principle being that if an intermediary resource is deleted all resources downstream of the path become inaccessible.

Could we state the difference then simply in terms of who owns the resource, and so who controls them?
* Client managed resources are controlled by the agent that created them
* Server created resources are controlled by the server - hence clients cannot use POST or DELETE methods

That seems like it could be expressed in terms of (potentially implicit) Access Control Rules?
Perhaps it is that the server has an access control rule that allows it to create and delete any resource.
If the client can sets access control rules on access control rules, what would stop it giving itself that ability too? Perhaps the server just takes it away immediately, or the Shexes limit what the client can do.

Headers? Perhaps here is a minor difference that needs to be hardcoded.
(Such headers could also go into a .meta file)

ACLs are RdfSources and may be restricted by shapes.

This would not account for
* ?

### File Naming Convention

Because we want to determine if a URL refers by at most one request to the file system, we need to build up our agents from file system attributes that don't require searching the file system or directory. This works nicely for LDPC and LDPRs:
* LDP Containers: these are encoded as directories
* LDP Resources: these are encoded as sybolic links pointing to the principal representation (e.g. `card.ttl`) or the latest version (e.g. `card.0.ttl`) ?

Can we extend this to server managed resources?
A reasonable idea is to have server managed resources such as acls be encoded as symboliclinks with a `.` in them, the dot distinguishing them from LDPRs.
They have to be symbolic links for the same reason that normal LDPRs have: so that they can point to the default representation or the latest version.

An intuition is to think of `.` s as delimiting a certain form of container. Indeed syntactically a `.` separates namespaces the way the `/` does.
Nevertheless, on all file systems I know of, there is a semantic difference: the `/` maps to a directory which groups other files, making it easy to step through the names and attributes. It would be feasible to implement `.` as directories too, to help keep server managed resources together, though such dirs would need a special convention to distinguish them from dirs for Basic Containers: perhaps a symlink `card -> card.r/`. We don't implement this yet.

So symbolic links with a `.` identify server managed resources.
But we notice that there are two quite different server managed resources:
1. the different versions and representations of a resource
2. The ACL for a resource - and perhaps other metadata for a resource, including acls on acls?

How different these and other things are still needs to be worked out, but it is clear that there are quite strong differences, and quite a lot of them. So we have all the extensions for mime types (over 100), the languages, the version extensions, the metadata extensions...

So when a request comes in for a resource containing one or more dots, it is not clear from the existence of a dot what type of actor should take care of it.

* `.acl` as an acl for a Container, should be an acl actor.
* `card.acl` is also an acl actor, but only the `card` actor can really tell the difference between `card.ttl` and `card.acl`, ...

So the algorithm has to be that a container, receiving a message whose target name
1. does not contain a dot and
1. maps to a directory is routed to the child directory
2. maps to a link is routed to the LDPR actor for the link
2. starts with a dot, it is routed to a special actor for that resource managed by the container
3. does contain a dot other than in the first position, then
1. will be routed to a file based resource, whose name is the first part up to the dot
2. that resource may route it on to another actor specialised for that extension.

So when arriving at the last stage of routing - when `name` can only refer to the next container or a resource in this container - then we have something like the following

```scala
val parts = name.split('.')
if parts(0) == "" then
if parts.size == 0 then ??? //the target of the resource is this container itself
else ??? //we have special resource for the container
else
//send message on to parts(0) child resource
//on receiving the message, that actor will continue as if the name started
// on from there with `parts(1)`
//if parts(1) == "acl" then that would be equivalent to the directory
// receiving the ".acl" request.

```

and looking at the attributes on the file system. If it points to a directory and there is a dot then this returns an error.
If it points to a symbolic link, then pass the message on to the child resource actor.


### Scripts interleaving Cmds from Server and Client

Could one create server managed resources using scripts, built on Plain HTTP?
For example, would it be possible to enhance a client POST-command, into a script so that once the resource created, its ACL resource can then be created by the server?

Note: this would require us to have scripts that can thread commands from the agent with commands from the client.
But currently the Commands don't contain information about who is executing it.
That information is in the `CmdMessage` object that wraps the command and so it covers the whole script.

Possible advantages:
* it could allow one to make new types of containers with their own properties just by developing such extra scripts showing what needs to happen on creation, deletion, ...
* could it help with content negotiation features?
* ... ?

Could this work by having two file-system name-spaces: one for user managed and one for server managed resources? We do seem to be pointing that way with anything involving a `.` being server managed.

This looks like something worth exploring once the basic commands have been tested out...
93 changes: 93 additions & 0 deletions notes/Versioning.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# Variants and Versioning

These may be very different topics, which should be kept apart.

## Versioning Standards

* [RFC 7089: Memento](https://tools.ietf.org/html/rfc7089).
* See work on a [Memento Ontology](http://timetravel.mementoweb.org/guide/api/) see
* [2017 discussion](https://groups.google.com/g/memento-dev/c/xNHUXDxPxaQ/m/DHG26vzpBAAJ) and
* [github issue 245](https://github.com/fcrepo/fcrepo-specification/issues/245#issuecomment-338482569).

How much of memento can we implement quickly to get going, in order to test some other aspects such as access control? [RFC 5829 Link Relation Types for Simple Version Navigation between Web Resources](https://tools.ietf.org/html/rfc5829) looks like a good starting point.

## First attempt

Currently the versioning is based on a simple numbering scheme: Each version of a resource gets a number, and a symbolic link points to the last version.

We start by having one root variant for a resource, which can give rise to other variants built from it automatically (e.g. Turtle variant can give rise to RDF/XML, JSON/LD, etc...), but we keep the history of the principal variant. For now, we assume that we keep the same mime types throughout the life of the resource.

As things get more complex, it feels like all variants of a resource should end up in a container like structure, as it would allow one to get all the resources together and reduce the search space for the variants.

### LDP Resources

1. Container receives a POST for a new resource
* `<card> -> <card>` create loop symlink (or we would need to point `<card>` to something that does not exist, or that is only partially uploaded, ...)
* create file `<card.0.ttl>` and pour in the content from the POST or PUT
* when finished relink `<card> -> <card.0.tll>`
2. Updating with PUT/PATCH follows a similar procedure
* `card -> card.v0.ttl` starting point
* create `<card.1.ttl>` and save new data to that file
* when finished relink `<card> -> <card.1.ttl>`
* `<card.1.ttl>` can have a `Link: ` header pointing to previous version `<card.0.ttl>`.

### Server Managed Resources

Our convention for Server managed resources is that they consist of the root for a resource plus a specific extension. So the ACL for a container `/container/` will be `/container/.acl`.
The ACL for a resource such as a personal profile document `card` will be `card.acl`

We think of ACLs as having a default content that links them to the parent ACL.
There is thus no need to have a representation on the file system for these defaults.
The resource has an "acl" link relation to a resource whose actor returns the default triple, but there need be no file info on the FS.
When an initial PUT is made, then the `card.acl` is created as a symbolic link pointing to `card.acl.0.ttl`

So ACLs only accept PUT and GET as methods (PATCH later).
1. `card.acl` receives a PUT then the resource `card.acl.0.ttl` is created after the graph content of the PUT is parsed and verified, and `card.acl` is linked to `card.acl.0.ttl`.
2. If another edit is made then the new version is placed in `card.acl.1.ttl` the symbolic link is changed to `<card.acl> -> <card.acl.1.ttl>`.

### Variants

Later we may want to allow upload of say humanly translated variants of some content. At that point as the of variants becomes more complicated it looks like it may not be a bad idea to have a resource be a directory of its variants.

## Other methods to get at variants

Using file extension conventions as suggested allows for wide implementations on most file systems. The onlhy requirement there is symbolic links.


* one can place metadata about file (what mime types they encode, where the variants are, ...) in file attributes (Java supports that)
* one can place metadata into a conventionally named RDF file with info of how the different versions are connected)
* Place variants in a directory to speed up search for variants
- specialised versioning File Systems
- implement this over CVS or git, ...


Yet, it is not ideal.

## Coherence problem

When a resource is versioned, would one not want the links to in the versioned version to point to the precise versions that they were pointing to, rather than link to the latest versions?

Advantages
* This would help give a real coherent historical overview of the data, helping one to understand what was really pointed to at a given time, and so understand the state of a conversation
* ...

But,
* does it require relinking all documents before they get archived?
In the current implementations, this would require finding for all links in a given document all the resource versions those links are pointing to, and rewriting that document to point to those versions!
* how would it works for links to resources on the open web?

### Versioning the whole Server

At least for all local data, this indicates that a system versioning the whole web server would at least be locally more correct. I am thinking of a setup where a change to `/people/henry/card` places the old version - and all other resources in the container `/2021/06/01/10/` so that the above `card` resource can be found at `/2021/06/01/people/henry/card` with all its inter-relations there.
Note: that would work well if all resource were exclusively written with relative URLs, when linking to local resources, as those files would form the same tree in the archive directory.


This may work quite well actually on a versioned file system. So it would
be worth finding out which filesystems are best suited for this. On such a filesystem making a change to a resource would involve:
1. making a snapshot of the FS and naming it,
2. creating a read-only archive of snapshots.

### Access Control on Archives

Those archive dirs should be access controlled, so as not to "pollute" the open web with too many versioned links, in a different way perhaps to the latest version.
This may indicate that archived versions have both the archived ACRs, but also effective ones that can override the archived ones.
2 changes: 1 addition & 1 deletion project/Dependencies.scala
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ object Dependencies {
// "eu.timepit" %% "refined-cats" % refinedVersion // optional
// ).map(_.exclude("org.scala-lang.modules","scala-xml_2.13"))

def allCompatibleLibs = (Seq(scalaz, alpakka) ++ akka ++ akkaTest ++ banana)
def allCompatibleLibs = (Seq(alpakka) ++ akka ++ akkaTest ++ banana)
.map(o => o cross CrossVersion.for3Use2_13)
}

Expand Down
12 changes: 12 additions & 0 deletions src/main/scala/run/cosy/RDF.scala
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,18 @@ object RDF {

extension (rdfUri: Rdf#URI)
def toAkka: Uri = Uri(ops.getString(rdfUri))

import org.w3.banana.binder.ToNode
import org.w3.banana.binder.ToNode.{given,_}

// todo: add this to banana
//see https://github.com/lampepfl/dotty/discussions/12527
implicit def URIToNode: ToNode[Rdf,Rdf#URI] = new ToNode[Rdf, Rdf#URI] {
def toNode(t: Rdf#URI): Rdf#Node = t
}
implicit def BNodeToNode: ToNode[Rdf,Rdf#BNode] = new ToNode[Rdf, Rdf#BNode] {
def toNode(t: Rdf#BNode): Rdf#Node = t
}

}

15 changes: 7 additions & 8 deletions src/main/scala/run/cosy/Solid.scala
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ import akka.util.Timeout
import com.typesafe.config.{Config, ConfigFactory}
import org.w3.banana.PointedGraph
import run.cosy.http.{IResponse, RDFMediaTypes, RdfParser}
import run.cosy.http.util._
import run.cosy.http.auth.{Agent, Anonymous, SignatureVerifier, KeyIdAgent, WebServerAgent}
import run.cosy.ldp.ResourceRegistry
import run.cosy.ldp.{Messages => LDP}
Expand Down Expand Up @@ -48,8 +49,7 @@ object Solid {
import run.cosy.ldp.fs.BasicContainer
given system: ActorSystem[Nothing] = ctx.system
given reg : ResourceRegistry = ResourceRegistry(ctx.system)
val withoutSlash = uri.withPath(uri.path.reverse.dropChars(1).reverse)
val rootRef: ActorRef[LDP.Cmd] = ctx.spawn(BasicContainer(withoutSlash, fpath), "solid")
val rootRef: ActorRef[LDP.Cmd] = ctx.spawn(BasicContainer(uri.withoutSlash, fpath), "solid")
val registry = ResourceRegistry(system)
val solid = new Solid(uri, fpath, registry, rootRef)
given timeout: Scheduler = system.scheduler
Expand Down Expand Up @@ -199,15 +199,14 @@ class Solid(
reqc.log.info("routing req " + reqc.request.uri)
val (remaining, actor): (List[String], ActorRef[LDP.Cmd]) = registry.getActorRef(path)
.getOrElse((List[String](), rootRef))
reqc.log.info(s"($remaining, $actor) = registry.getActorRef($path)")

def cmdFn(replyTo: ActorRef[HttpResponse]): LDP.Cmd = remaining match
case Nil => LDP.WannaDo(LDP.CmdMessage(SolidCmd.plain2(reqc.request), agent, replyTo))
case head :: tail => LDP.RouteMsg(
NonEmptyList.fromSeq(head,tail.toSeq),
def routeWith(replyTo: ActorRef[HttpResponse]): LDP.Cmd = LDP.RouteMsg(
NonEmptyList.fromSeq("/",remaining.toSeq),
LDP.CmdMessage(SolidCmd.plain2(reqc.request), agent, replyTo)
)
).nextRoute

actor.ask[HttpResponse](cmdFn).map(RouteResult.Complete(_))
actor.ask[HttpResponse](routeWith).map(RouteResult.Complete(_))
}


Expand Down
4 changes: 4 additions & 0 deletions src/main/scala/run/cosy/http/RDFMediaTypes.scala
Original file line number Diff line number Diff line change
Expand Up @@ -121,5 +121,9 @@ object RDFMediaTypes {
`application/sparql-results+json`, `application/sparql-results+xml`, `application/trix`
)

object RDFData {
def unapply(mr: MediaRange): Option[MediaType] = rdfData.find(mt => mr.matches(mt))
}


}
Loading

0 comments on commit 7771d58

Please sign in to comment.