Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial implementation for RDF-based tests QueryEvaluationTest #249

Merged
merged 5 commits into from
Mar 30, 2022

Conversation

Mec-iS
Copy link
Contributor

@Mec-iS Mec-iS commented Mar 28, 2022

From #248

This is the first draft for implementing automated RDF tests.
The first batch of test from RDF-tests (basic) can be run with pytest tests/rdf_tests/test_rdf_basic.py -k test_rdf_runner -s from the project directory; tests are in tests/rdf_tests/dat. For the moment the only assertion implemented is a check on the length of returned results. This is already interesting as some kg query outputs return more or less bindings compared to the expected ones.

The script should work also on oxigraph-tests but it doesn't, it seems the read_manifest file provided with rdflib cannot parse the oxigraph manifest, maybe a discrepancy in the XML structure? @Tpt please provide some feedback.

Currently the tests are cut-pasted.

Tests with anomalies:

  • Running basic/Non-matching triple pattern ERROR: Non-matching triple pattern resulted in object of type 'NoneType' has no len()
  • Running basic/Basic - Prefix/Base 1 Basic - Prefix/Base 1 resulted in False
  • Running basic/Basic - Var 1 Basic - Var 1 resulted in False
  • Running basic/Basic - Var 2 Basic - Var 2 resulted in False
  • Running basic/Basic - Term 6 Basic - Term 6 resulted in False
  • Running basic/Basic - Term 7 ERROR: Basic - Term 7 resulted in Expected {SelectQuery | ConstructQuery | DescribeQuery | AskQuery}, found '.' (at char 140), (line:5, col:23)

The ones resulting in "False" mean length discrepancies between expected and actual output, the one with errors are exceptions raised.

Copy link

@Tpt Tpt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for moving this forward!

I have found why the Oxigraph tests are ignored (see my inline comment inrdflib_tools.py). I also allowed myself to comment about some problems I found while quickly trying the code.

tests/rdf_tests/rdflib_tools.py Outdated Show resolved Hide resolved
tests/rdf_tests/rdflib_tools.py Outdated Show resolved Hide resolved
tests/rdf_tests/dat/oxigraph-tests/sparql/order_terms.rq Outdated Show resolved Hide resolved
tests/rdf_tests/test_rdf_basic.py Outdated Show resolved Hide resolved
tests/rdf_tests/test_rdf_basic.py Outdated Show resolved Hide resolved
@Tpt
Copy link

Tpt commented Mar 28, 2022

By the way, Oxigraph does not pass all official SPARQL tests. This is mostly due to the fact that Oxigraph storage normalizes literals like numbers. For example "01"^^xsd:integer and "1"^^xsd:integer are considered to be the same. This leads to the failure of some tests taking care of duplicates. The list of failing test is here.

@Mec-iS
Copy link
Contributor Author

Mec-iS commented Mar 29, 2022

Is there any straightforward way of making rdflib SPARQL query to return serialised data (ttl or xml) instead of the row iterator?

or what is the given way of testing the result of an rdflib query to a given ttl file? @ceteri

@Tpt
Copy link

Tpt commented Mar 29, 2022

Is there any straightforward way of making rdflib SPARQL query to return serialised data (ttl or xml) instead of the row iterator?

or what is the given way of testing the result of an rdflib query to a given ttl file? @ceteri

RDFlib provides a parser for SPARQL results encoded in RDF.
Then it uses this function to check if results sets are compatible.

I tried in Oxigraph to compare results set encoded in RDF using the graph isomorphism algorithm. It was very slow because results sets encoded in RDF contains a lot of blank nodes only connected to other blank nodes, making the hash based alogirthms not very efficient...

@Mec-iS Mec-iS marked this pull request as ready for review March 30, 2022 10:40
@Mec-iS
Copy link
Contributor Author

Mec-iS commented Mar 30, 2022

In some tests kglab fails, most of them are "Python recursion limit exceeded".

@Mec-iS Mec-iS changed the title Initial implementation for RDF-based tests Initial implementation for RDF-based tests QueryEvaluationTest Mar 30, 2022
@Mec-iS Mec-iS merged commit 540ffdc into main Mar 30, 2022
@Mec-iS Mec-iS mentioned this pull request Mar 30, 2022
@ceteri
Copy link
Collaborator

ceteri commented Apr 3, 2022

@Mec-iS this is excellent, and @Tpt thank you kindly!

@ceteri
Copy link
Collaborator

ceteri commented Apr 3, 2022

In terms of graph isomorphism algorithms, I wish there was more open source available for graph sketch algorithms. That might help with the costs. I've only found one https://github.com/kenkoooo/graph-sketch-fractality although it's based on a CLI and not quite the similarity measures that we'd need.

@Mec-iS Mec-iS deleted the rdf-tests-1 branch September 5, 2022 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants