Reconciliation rules for n-ary relations encoded in RDF triples
A detailed description of the motivation and the algorithms of tcn3r is available in the related article.
When citing tcn3r, please use the following reference:
Pierre Monnin, Miguel Couceiro, Amedeo Napoli, and Adrien Coulet. "Knowledge-Based Matching of n-ary Tuples". In: Ontologies and Concepts in Mind and Machine - 25th International Conference on Conceptual Structures, ICCS 2020, Bolzano, Italy, September 18–20, 2020, Proceedings. Ed. by Mehwish Alam, Tanya Braun, and Bruno Yun. Vol. 12277. Lecture Notes in Computer Science. Springer, 2020, pp. 48–56. doi: 10.1007/978-3-030- 57855-8_4. url: https://doi.org/10.1007/978-3-030-57855-8_4.
@inproceedings{Monnin2020,
author = {Pierre Monnin and
Miguel Couceiro and
Amedeo Napoli and
Adrien Coulet},
editor = {Mehwish Alam and
Tanya Braun and
Bruno Yun},
title = {Knowledge-Based Matching of n-ary Tuples},
booktitle = {Ontologies and Concepts in Mind and Machine - 25th International Conference
on Conceptual Structures, {ICCS} 2020, Bolzano, Italy, September 18-20,
2020, Proceedings},
series = {Lecture Notes in Computer Science},
volume = {12277},
pages = {48--56},
publisher = {Springer},
year = {2020},
url = {https://doi.org/10.1007/978-3-030-57855-8_4},
doi = {10.1007/978-3-030-57855-8_4},
}
Executes reconciliation rules on every pair of relationships in the triplestore.
tcn3r --configuration conf.json -o output.ttl --simlimit SL --complimit CL --dimensionlimit DL --max-rows MR -t threads
where:
- conf.json: is the configuration file needed to configure the scripts -- see below
- output.ttl: is the path to the output TTL file where the generated links between relationships will be stored
- SL: Minimum similarity on non-empty aggregated dimensions to consider relations as related (< 0 to disable)
- CL: Minimum number of non-empty comparable aggregated dimensions to consider relations as related (< 0 to disable)
- DL: Minimum number of non-empty aggregated dimensions to apply simlimit or complimit (< 0 to disable)
- MR: Max number of rows the SPARQL endpoint can return for a query
- threads: number of threads to use when comparing relations (e.g., 8)
You can use the target run
of the provided Makefile that calls the Docker image with:
docker run --rm $(MAPUSER) -v ${PWD}/data:/data $(INAME):$(VERSION) --configuration /data/conf.json.example -o /data/output.ttl --simlimit 0.8 --complimit 2 --dimensionlimit 2 --max-rows 10000 --explain false -t 4
The data
subdirectory of the current directory is shared with the Docker container as /data
.
It is expected that the JSON configuration file is in this directory.
/data
is also the directory where the output TTL file will be stored.
simlimit
is set to 0.8, complimit
to 2 and dimensionlimit
is set to 2.
4 threads will be used.
tcn3r --configuration conf.json -o output.ttl --simlimit SL --complimit CL --dimensionlimit DL --max-rows MR --explain true
where:
- conf.json: is the configuration file needed to configure the scripts -- see below
- output.txt: is the path to the output text file where explanations of links between relationships will be stored
- SL: Minimum similarity on non-empty aggregated dimensions to consider relations as related (< 0 to disable)
- CL: Minimum number of non-empty comparable aggregated dimensions to consider relations as related (< 0 to disable)
- DL: Minimum number of non-empty aggregated dimensions to apply simlimit or complimit (< 0 to disable)
- MR: Max number of rows the SPARQL endpoint can return for a query
URIs of relations to compare will be asked interactively.
Not available.
A configuration JSON file is needed to configure the scripts. An example is provided. It should contains:
{
"server-address": "http://pgxlod.loria.fr/sparql",
"url-json-conf-attribute": "format",
"url-json-conf-value": "application/sparql-results+json",
"url-default-graph-attribute": "default-graph-uri",
"url-default-graph-value": "http://pgxlod.loria.fr/",
"url-query-attribute": "query",
"timeout": 10000000,
"relation-types": [
"http://pgxo.loria.fr/PharmacogenomicRelationship"
],
"dimensions": {
"GeneticFactor": {
"ind-types": [
"http://pgxo.loria.fr/GeneticFactor"
],
"rel2ind-predicates": [
"http://pgxo.loria.fr/isAssociatedWith",
"http://pgxo.loria.fr/isNotAssociatedWith"
],
"ind2dep-predicates": [
],
"preorder": "Individuals",
"ind-leq-predicates": [
"http://purl.obolibrary.org/obo/BFO_0000050"
],
"ind-geq-predicates": [
"http://purl.obolibrary.org/obo/BFO_0000051"
]
},
"Drug": {
"ind-types": [
"http://pgxo.loria.fr/Drug"
],
"rel2ind-predicates": [
"http://pgxo.loria.fr/isAssociatedWith",
"http://pgxo.loria.fr/isNotAssociatedWith"
],
"ind2dep-predicates": [
],
"preorder": "Annotations",
"ann-base-uris": [
"http://purl.obolibrary.org/obo/CHEBI_",
"http://bio2rdf.org/chebi:",
"http://identifiers.org/chebi/",
"http://purl.bioontology.org/ontology/UATC/",
"http://bio2rdf.org/atc:",
"http://identifiers.org/atc/"
],
"ind2ann-predicates": [
"http://www.w3.org/1999/02/22-rdf-syntax-ns#type"
],
"ann-leq-predicates": [
"http://www.w3.org/2000/01/rdf-schema#subClassOf"
],
"ann-geq-predicates": [
]
},
"Phenotype": {
"ind-types": [
"http://pgxo.loria.fr/Phenotype"
],
"rel2ind-predicates": [
"http://pgxo.loria.fr/isAssociatedWith",
"http://pgxo.loria.fr/isNotAssociatedWith"
],
"ind2dep-predicates": [
"http://purl.obolibrary.org/obo/RO_0002502"
],
"preorder": "Annotations",
"ann-base-uris": [
"http://purl.bioontology.org/ontology/MESH/",
"http://bio2rdf.org/mesh:",
"http://identifiers.org/mesh/"
],
"ind2ann-predicates": [
"http://www.w3.org/1999/02/22-rdf-syntax-ns#type"
],
"ann-leq-predicates": [
"http://www.w3.org/2000/01/rdf-schema#subClassOf"
],
"ann-geq-predicates": [
]
}
},
"output-pred-equal": "http://www.w3.org/2002/07/owl#sameAs",
"output-pred-equiv": "http://www.w3.org/2004/02/skos/core#closeMatch",
"output-pred-leq": "http://www.w3.org/2004/02/skos/core#broadMatch",
"output-pred-geq": "http://www.w3.org/2004/02/skos/core#narrowMatch",
"output-pred-comparable": "http://www.w3.org/2004/02/skos/core#relatedMatch",
"output-pred-dependency-related": "http://www.w3.org/2004/02/skos/core#related"
}
with:
- server-address: address of the SPARQL endpoint to query
- url-json-conf-attribute: URL attribute to use to get JSON results
- url-json-conf-value: value of the url-json-conf-attribute to get JSON results
- url-default-graph-attribute: URL attribute to use to define the default graph
- url-default-graph-value: value of url-default-graph-attribute to define the default graph
- url-query-attribute: URL attribute to use to define the query
- timeout: timeout value for HTTP requests
- relation-types: URIs of classes whose instances are relationships to reconcile
- dimensions: dictionary of dimensions. Each dimension should contain:
- ind-types: classes that are instantiated by elements of this dimension. Subclasses will be considered as well.
- rel2ind-predicates: predicates connecting relations with elements of this dimension. Subproperties will be considered as well.
- ind2dep-predicates: predicates connecting elements of this dimension with potential dependencies. Subproperties will be considered as well.
- preorder: preorder to use for comparison on this dimension. Potential values:
- SetInclusion: set inclusion preorder
- Individuals: preorder between individuals, must be specified:
- ind-leq-predicates: list of predicates indicating that an individual is lower or equal than another. Subproperties will be considered as well.
- ind-leq-predicates: list of predciates indicating that an individual is greater or equal than another. Subproperties will be considered as well.
- Annotations: preorder based on annotations of individuals, must be specified:
- ann-base-uris: prefixes of URIs of annotations to consider
- ind2ann-predicates: predicates linking elements of this dimension to their annotations. Subproperties will be considered as well.
- ann-leq-predicates: predicates indicating that an annotation is lower or equal than another. Subproperties will be considered as well.
- ann-geq-predicates: predicates indicating that an annotation is greater or equal than another. Subproperties will be considered as well.
- output-pred-equal: URI of a predicate to use to identify equal relationships
- output-pred-equiv: URI of a predicate to use to identify equivalent relationships
- output-pred-leq: URI of a predicate to use to identify lower or equal relationships
- output-pred-geq: URI of a predicate to use to identify greater or equal relationships
- output-pred-comparable: URI of a predicate to use to identify comparable relationships
- output-do-related-predicate: URI of a predicate to use to identify relationships that are related
A test/pgxo+test.owl file is available to test tcn3r.
It should be imported in a triplestore.
After the import, run tcn3r using the test configuration provided in test/test-conf.json and the following parameters
--simlimit 0.85 --complimit 3 --dimensionlimit 3
.
You may need to adapt server-address
depending on the location of the triplestore you use.
The description of the test cases can be found in test/documentation-tests.pdf.
The (sorted) expected results can be found in test/expected-output.ttl.
- C++17
- boost
- libcurl
- OpenMP