-
Notifications
You must be signed in to change notification settings - Fork 555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix explicit dataset (FROM
and FROM NAMED
clauses)
#2794
Conversation
When using a FROM or FROM NAMED clause: redefine entirely the query's RDF dataset. Include only the graphs in FROM clause in the query's default graph Include only the graphs in the FROM NAMED clause in the query's named graphs Try to load external graphs only if they don't already exist in the given ConjunctiveGraph
Using rdflib rules
… test_dataset_inclusive Since ConjunctiveGraph has been deprecated. Also define if dataset is inclusive or exclusive at Dataset init instead of using global param SPARQL_DEFAULT_GRAPH_UNION
I now use Dataset instead of ConjunctiveGraph as this one is now deprecated. |
Thank you @apicouSP for this Pull Request. I'm not super familiar with the RDFLib SPARQL executor (I don't know if any current maintainers are) so this will take a bit longer than normal to review this, but we'll try to get it merged before our upcoming RDFLib v7.1 release. |
@recalcitrantsupplant can you please review this? |
I'm happy to merge this as-is now, and remove references to ConjunctiveGraph across the whole codebase in a separate PR. |
rdflib/plugins/sparql/sparql.py
Outdated
self.graph = Graph() | ||
for d in datasetClause: | ||
if d.default: | ||
self.graph += graph.get_context(d.default) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like its pretty costly to copy every triple from each FROM
graph into a new Graph. Could it be done by adding the existing graph to a new Dataset (or ConjunctiveGraph), and setting default_union on that store?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand well your suggestion, this was previously done in the load function:
self.graph += self.dataset.get_context(source) # type: ignore[operator]
Exactly the same way.
So that's just something that I didn't change.
I will see if I can optimize that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation. I see now this of course isn't something you added, but it was just to this codepath.
There is no need to include an optimization for that in this PR, because it could introduce an unintended change in behaviour to SPARQL processing.
I am planning on doing a large sprint of work on optimising the SPARQL processor after RDFLib v7.1.0 is released.
I suspect efficiency problems (exactly like this one) are behind the odd extremely slow execution users are seeing in issues like:
#2147
and
#1775
and
#787
…tead of ConjunctiveGraph
@apicouSP
|
Co-authored-by: Ashley Sommer <[email protected]>
Summary of changes
WHAT
FROM
orFROM NAMED
clause: redefine entirely the query's RDF datasetWhen a query (
SELECT
,CONSTRUCT
,ASK
,DESCRIBE
) is using aFROM
orFROM NAMED
clause, we redefine entirely the query's RDF datasetInclude only the graphs in
FROM
clauses in the query's default graphInclude only the graphs in the
FROM NAMED
clauses in the query's named graphsAs a consequence, when a user defines a
FROM
clause in his query but does not defineFROM NAMED
, then the named graphs are considered empty set.And vice versa: if a user defines a
FROM NAMED
clause but does not define aFROM
clause, then the default graph is considered empty.I added tests specifically for that
That's my interpretation of this
Also since the RDF Dataset is entirely redefined when using
FROM
orFROM NAMED
clauses then for those queries the parameterSPARQL_DEFAULT_GRAPH_UNION
will be ignoredLoad external graphs only if they don't already exist
Try to load external graphs only if they don't already exist in the given
ConjunctiveGraph
As a consequence, you don't have to set
SPARQL_DEFAULT_GRAPH_UNION
toFalse
if all the graphs mentioned inFROM
andFROM NAMED
clauses already exist in the givenConjunctiveGraph
WHAT it is not
DELETE
andINSERT
). Since code is independent, I will probably make another PR for thatWHY:
So queries behave closer to W3C spec
Issues:
This PR fixes this issue (confirmed)
And also that issue as far as I understand
This also solves partially this discussion
Checklist
the same change.
so maintainers can fix minor issues and keep your PR up to date.