Fix explicit dataset (`FROM` and `FROM NAMED` clauses) #2794

apicouSP · 2024-05-29T09:45:14Z

Summary of changes

WHAT

`FROM` or `FROM NAMED` clause: redefine entirely the query's RDF dataset

When a query (SELECT, CONSTRUCT, ASK, DESCRIBE) is using a FROM or FROM NAMED clause, we redefine entirely the query's RDF dataset
Include only the graphs in FROM clauses in the query's default graph
Include only the graphs in the FROM NAMED clauses in the query's named graphs

As a consequence, when a user defines a FROM clause in his query but does not define FROM NAMED, then the named graphs are considered empty set.
And vice versa: if a user defines a FROM NAMED clause but does not define a FROM clause, then the default graph is considered empty.
I added tests specifically for that
That's my interpretation of this

Also since the RDF Dataset is entirely redefined when using FROM or FROM NAMED clauses then for those queries the parameter SPARQL_DEFAULT_GRAPH_UNION will be ignored

Load external graphs only if they don't already exist

Try to load external graphs only if they don't already exist in the given ConjunctiveGraph

As a consequence, you don't have to set SPARQL_DEFAULT_GRAPH_UNION to False if all the graphs mentioned in FROM and FROM NAMED clauses already exist in the given ConjunctiveGraph

WHAT it is not

I didn't change the behavior of triples duplicated in different named graphs when those named graphs are merged into the query's default graph. Would need further discussion
I didn't change the behavior or the updates (DELETE and INSERT). Since code is independent, I will probably make another PR for that

WHY:

So queries behave closer to W3C spec

Issues:

This PR fixes this issue (confirmed)
And also that issue as far as I understand
This also solves partially this discussion

Checklist

Checked that there aren't other open pull requests for
the same change.
Checked that all tests passes .
[] type checking: black and flake8 do not agree on how to format some lines, I chose to take black's suggestions
If the change has a potential impact on users of this project:
- Added or updated tests that fail without the change.
- Updated relevant documentation to avoid inaccuracies. Just following W3C spec, which link is already in the docs.
- Considered adding additional documentation.
Considered granting push permissions to the PR branch,
so maintainers can fix minor issues and keep your PR up to date.

When using a FROM or FROM NAMED clause: redefine entirely the query's RDF dataset. Include only the graphs in FROM clause in the query's default graph Include only the graphs in the FROM NAMED clause in the query's named graphs Try to load external graphs only if they don't already exist in the given ConjunctiveGraph

Using rdflib rules

… test_dataset_inclusive Since ConjunctiveGraph has been deprecated. Also define if dataset is inclusive or exclusive at Dataset init instead of using global param SPARQL_DEFAULT_GRAPH_UNION

apicouSP · 2024-06-18T14:02:48Z

I now use Dataset instead of ConjunctiveGraph as this one is now deprecated.
I also fixed style and type errors in new tests. Now black, ruff and mypy should pass.
Let me know if you need something to merge this!
Sorry I made too much commits.

coveralls · 2024-06-21T09:11:48Z

coverage: 91.058% (+0.03%) from 91.03%
when pulling 8ed095c on apicouSP:fix-explicit-dataset
into 0ecc400 on RDFLib:main.

coveralls · 2024-07-24T10:09:50Z

coverage: 90.646% (+0.02%) from 90.627%
when pulling 3c034c9 on apicouSP:fix-explicit-dataset
into d7b2d25 on RDFLib:main.

ashleysommer · 2024-07-24T10:27:59Z

Thank you @apicouSP for this Pull Request.
This looks like a good change to bring RDFLib SPARQL executor more in line with the SPARQL 1.1 specs.
Note, I've fixed all the issues that were preventing our CI pipelines from working properly, so I can see now your tests are passing.

I'm not super familiar with the RDFLib SPARQL executor (I don't know if any current maintainers are) so this will take a bit longer than normal to review this, but we'll try to get it merged before our upcoming RDFLib v7.1 release.

rdflib/plugins/sparql/sparql.py

nicholascar · 2024-07-26T02:59:20Z

@recalcitrantsupplant can you please review this?

ashleysommer · 2024-07-30T23:48:22Z

I'm happy to merge this as-is now, and remove references to ConjunctiveGraph across the whole codebase in a separate PR.

rdflib/plugins/sparql/sparql.py

ashleysommer · 2024-07-31T00:38:04Z

rdflib/plugins/sparql/sparql.py

+                self.graph = Graph()
+                for d in datasetClause:
+                    if d.default:
+                        self.graph += graph.get_context(d.default)


Seems like its pretty costly to copy every triple from each FROM graph into a new Graph. Could it be done by adding the existing graph to a new Dataset (or ConjunctiveGraph), and setting default_union on that store?

If I understand well your suggestion, this was previously done in the load function:
self.graph += self.dataset.get_context(source) # type: ignore[operator]
Exactly the same way.
So that's just something that I didn't change.
I will see if I can optimize that

Thanks for the explanation. I see now this of course isn't something you added, but it was just to this codepath.
There is no need to include an optimization for that in this PR, because it could introduce an unintended change in behaviour to SPARQL processing.

I am planning on doing a large sprint of work on optimising the SPARQL processor after RDFLib v7.1.0 is released.
I suspect efficiency problems (exactly like this one) are behind the odd extremely slow execution users are seeing in issues like:
#2147
and
#1775
and
#787

…tead of ConjunctiveGraph

ashleysommer · 2024-07-31T11:51:50Z

@apicouSP
Currently tests failing after the latest update:

py39-docs: commands[3]> poetry run python -m mypy --show-error-context --show-error-codes --junit-xml=test_reports/3.9-ubuntu-latest-mypy-junit.xml
rdflib/plugins/sparql/sparql.py: note: In member "__init__" of class "QueryContext":
rdflib/plugins/sparql/sparql.py:286: error: Incompatible types in assignment (expression has type "ConjunctiveGraph", variable has type "Optional[Dataset]") [assignment]
Found 1 error in 1 file (checked 398 source files)

rdflib/plugins/sparql/sparql.py

Co-authored-by: Ashley Sommer <[email protected]>

apicouSP and others added 7 commits May 28, 2024 16:40

Formatting with back and flake8 for commit d6858e0

a6ef4b3

Using rdflib rules

Merge branch 'main' into fix-explicit-dataset

2fa0089

Merge branch 'main' into fix-explicit-dataset

fa75803

Fix import order on test_dataset_exclusive

4ea3792

Use Dataset instead of ConjunctiveGraph in test_dataset_exclusive and…

b01da98

… test_dataset_inclusive Since ConjunctiveGraph has been deprecated. Also define if dataset is inclusive or exclusive at Dataset init instead of using global param SPARQL_DEFAULT_GRAPH_UNION

Fix graph name def in test_dataset_inclusive and test_dataset_exclusive

5ea2fe0

nicholascar self-requested a review June 21, 2024 09:04

Merge branch 'main' into fix-explicit-dataset

8ed095c

Merge branch 'main' into fix-explicit-dataset

9f68b42

Merge branch 'main' into fix-explicit-dataset

a71a8d0

ashleysommer reviewed Jul 26, 2024

View reviewed changes

rdflib/plugins/sparql/sparql.py Outdated Show resolved Hide resolved

nicholascar requested a review from edmondchuc July 26, 2024 02:58

Merge branch 'main' into fix-explicit-dataset

13b2cdf

ashleysommer reviewed Jul 31, 2024

View reviewed changes

Only get_context on default and named graphs once and use Dataset ins…

2c69efc

…tead of ConjunctiveGraph

ashleysommer requested changes Jul 31, 2024

View reviewed changes

rdflib/plugins/sparql/sparql.py Outdated Show resolved Hide resolved

apicouSP and others added 2 commits July 31, 2024 14:26

Update rdflib/plugins/sparql/sparql.py

7b42c88

Co-authored-by: Ashley Sommer <[email protected]>

Merge branch 'main' into fix-explicit-dataset

3c034c9

ashleysommer approved these changes Jul 31, 2024

View reviewed changes

ashleysommer merged commit 5876266 into RDFLib:main Jul 31, 2024
22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix explicit dataset (`FROM` and `FROM NAMED` clauses) #2794

Fix explicit dataset (`FROM` and `FROM NAMED` clauses) #2794

apicouSP commented May 29, 2024

apicouSP commented Jun 18, 2024

coveralls commented Jun 21, 2024

coveralls commented Jul 24, 2024 •

edited

Loading

ashleysommer commented Jul 24, 2024

nicholascar commented Jul 26, 2024

ashleysommer commented Jul 30, 2024

ashleysommer Jul 31, 2024

apicouSP Jul 31, 2024

ashleysommer Jul 31, 2024

ashleysommer commented Jul 31, 2024 •

edited

Loading

Fix explicit dataset (FROM and FROM NAMED clauses) #2794

Fix explicit dataset (FROM and FROM NAMED clauses) #2794

Conversation

apicouSP commented May 29, 2024

Summary of changes

WHAT

FROM or FROM NAMED clause: redefine entirely the query's RDF dataset

Load external graphs only if they don't already exist

WHAT it is not

WHY:

Issues:

Checklist

apicouSP commented Jun 18, 2024

coveralls commented Jun 21, 2024

coveralls commented Jul 24, 2024 • edited Loading

ashleysommer commented Jul 24, 2024

nicholascar commented Jul 26, 2024

ashleysommer commented Jul 30, 2024

ashleysommer Jul 31, 2024

Choose a reason for hiding this comment

apicouSP Jul 31, 2024

Choose a reason for hiding this comment

ashleysommer Jul 31, 2024

Choose a reason for hiding this comment

ashleysommer commented Jul 31, 2024 • edited Loading

Fix explicit dataset (`FROM` and `FROM NAMED` clauses) #2794

Fix explicit dataset (`FROM` and `FROM NAMED` clauses) #2794

`FROM` or `FROM NAMED` clause: redefine entirely the query's RDF dataset

coveralls commented Jul 24, 2024 •

edited

Loading

ashleysommer commented Jul 31, 2024 •

edited

Loading