Rethinking datasets and graphs?

Looking at datasets with a fresh eye...

TL;DR: I found the current usage of @graph and the handling of RDF datasets fairly confusing. My proposal is to start from scratch, ie, deprecating @graph and replacing the functionalities with something cleaner.

Note that this is not a fully and 100% thought-through proposal, more jotting down idea for further discussion.

Problems I found

`"@graph"` as a top-level term

Looking at the Named Graph Section of the syntax document: if @graph appears as a top level term then

if there are no other statements on the top level except possibly a @context, then the result of an RDF conversion is a bush (ie, an RDF graph that is decomposed to a set of tree-like structures, ie, a hierarchy of RDF triples starting by a specific subject; thanks to Gregg for that term:-)
otherwise, the result is a dataset named with a blank node (or a real URL if a @id is provided), and with other statements made on that named graph ID.

To use, essentially, the example from the spec:

{
"@context" : {...}
"@graph": [
    {
      "@id": "http://manu.sporny.org/about#manu",
      ...
    }, {
      "@id": "http://greggkellogg.net/foaf#me",
      ...,
    }
]
}

Generates simply (in Turtle):

<http://manu.sporny.org/about#manu>
   ...
.
<http://greggkellogg.net/foaf#me>
   ...
.

whereas

{
"@context" : {...}
"generatedAt": "2012-04-09",
"@id": "http://www.example.org/",
"@graph": [
    {
      "@id": "http://manu.sporny.org/about#manu",
      ...
    }, {
      "@id": "http://greggkellogg.net/foaf#me",
      ...,
    }
]
}

yields (in TriG):

<http://www.example.org/> generatedAt "2012-04-09" .
<http://www.example.org/> {
    <http://manu.sporny.org/about#manu>
       ...
    .
    <http://greggkellogg.net/foaf#me>
       ...
    .
}

In other words, the behavior of @graph depends on the presence (or not) of some other top level terms or keywords. In one case it is "just" the encoding of a good old RDF Graph, in the other case it creates an RDF Dataset, including an explicit default graph.

@gkellogg responds: This was considered in the 1.0 timeframe, and the thought at the time was that it was better to "overload" this usage than to introduce a new keyword. There's actually a fair amount of JSON-LD in the field which uses the top-level @graph pattern to create a "bush" of objects.

I find this confusing, and borderline wrong. Conceptually, an RDF Graph and an RDF Dataset are not the same. Considering the default Graph of a Dataset to be a Graph at large, ie, that is is all right to represent any RDF graph as the default graph of an imaginary Dataset (essentially: everything is a Dataset...) is conceptually misleading imho. Let us realize, for example, that the RDF WG could never agree on one single semantics on RDF Datasets, see the "RDF 1.1: On Semantics of RDF Datasets" Note. We should not ignore this problem in a format that is, at the end of the day, the serialization of RDF.

@gkellogg responds: @graph introduces a graph, a series of them create a dataset. Although, the map structure you introduce below complicates this. Otherwise, I don't quite follow your reasoning which leads to the conclusion that this is misleading.

Bottom line: in my view two very different concepts are conflated in the current usage of @graph and it also makes the JSON-LD difficult to conceptualize.

`"@graph"` containers

I found the (new) graph container feature confusing, too, and partially for the same reasons. The usage of the ""@container":"@graph" fundamentally changes the behavior of a term, insofar as it generates an RDF Dataset as opposed to RDF statements. This makes it fundamentally different from all other "container" options; again, conflating notions.

Example 62 in the document says:

"@context" {
    "claim": {
       "@id": "https://w3id.org/credentials#claim",
       "@container": "@graph"
    }
},
"generatedAt": "2012-04-09",
"@id": "http://www.example.org/",
"claim": [
    {
      "@id": "http://manu.sporny.org/about#manu",
      ...
    }, {
      "@id": "http://greggkellogg.net/foaf#me",
      ...,
    }
]

which yields:

<http://www.example.org/> genereatedAt "2012-04-09" .
<http://www.example.org/> <https://w3id.org/credentials#claim> _:b .
_:b {
    <http://manu.sporny.org/about#manu>
       ...
    .
    <http://greggkellogg.net/foaf#me>
       ...
    .
}

Ie, in this case, two named graphs are generated out of the blue, one with a blank node. Note that the specification restricts the "second" named graph to have a blank node identifier, which makes it of a very restricted use imho.

Encoding a simple dataset

I am not sure how I can encode, in JSON-LD, the following RDF Dataset:

<http://example.org/1> {
    <http://manu.sporny.org/about#manu> a foaf:Person;
        foaf:name "Manu Sporny",
        foaf:knows <http://greggkellogg.net/foaf#me> .
}
<http://example.org/2> {
    <http://greggkellogg.net/foaf#me> a foaf:Person;
        foaf:name "Gregg Kellogg",
        foaf:knows <http://manu.sporny.org/about#manu> .
}

The closest I found is the Named Graph Indexing feature. However, that is bound to the same "top level" behavior as in example 62, ie, would generate the dataset above, but also some extra triplets in the default graph.

Start from scratch?

I would like to consider strictly separating bushes from datasets. These are two disjoint notions and we should treat them as such.

Bushes

Of course, we can already define a bush thusly:

[
   {
     "@id" : "http://www.example.org/1",
     "http://a.b.c" : "something"
   },{
     "@id" : "http://www.example.org/2",
     "http://d.e.f" : "something"
   }
]

The problem is the fact that we cannot add @context globally. However, we are much more flexible, in JSON-LD 1.1, in handling contexts so, why can't we do something likeL

A context may have an "id"
It is possible to refer to the context by its Id from elsewhere. Ie, we could say:

[
   {
       "@context" : {
           "@id" : "_:a"
           ...
       }
   },
   {
     "@context" : "_:a",
     "@id" : "http://www.example.org/1",
     "http://a.b.c" : "something"
   },{
     "@context" : "_:a",
     "@id" : "http://www.example.org/2",
     "http://d.e.f" : "something"
   }
]

It is still a bit of a load on the author, but I would think it is way cleaner. We could go one step further, and consider a top level context to be valid for every element of an array, too.

@gkellogg responds: Definitely something to consider, but there is a fair amount of use in the wild using the existing pattern.

Datasets

We would have a @dataset that would be used only for, well, datasets (we should not use @graph to avoid backward compatibility issues, although we should deprecate it).

"@dataset" : {
    "URL1" : {
        Some RDF statements here
    },
    "URL2" : [
        {
            We could also define a bush just like above
        },
        {

        }
    ],
    "@none" : [{
        Default graph statements here
    }]
}

That is it. The object in a @dataset MUST have URI-s (compact URI-s, blank nodes, etc) or @none as keys; values can be single objects for a simple Graph with one subject, or a full bush.

@gkellogg responds: I like the idea of being able to do this using a top-level map. Although, I think we could handle this in a backwards-compatible way using something like the following:

{
  "@context": {
    ...
    "dataset": {"@id": "@graph", "@container": ["@graph", "@id"]}
  },
  "dataset" : {
    "URL1" : {
        Some RDF statements here
    },
    "URL2" : [
        {
            We could also define a bush just like above
        },
        {

        }
    ],
    "@none" : [{
        Default graph statements here
    }]
  }
}

The flexible way of handling and referring to contexts via an @id would make this even easier.

(I have not considered the graph indexes yet)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rethinking datasets and graphs?

Looking at datasets with a fresh eye...

Problems I found

`"@graph"` as a top-level term

`"@graph"` containers

Encoding a simple dataset

Start from scratch?

Bushes

Datasets

Clone this wiki locally

Rethinking datasets and graphs?

Looking at datasets with a fresh eye...

Problems I found

"@graph" as a top-level term

"@graph" containers

Encoding a simple dataset

Start from scratch?

Bushes

Datasets

Clone this wiki locally

`"@graph"` as a top-level term

`"@graph"` containers