Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration with dictionaries #20

Open
frafra opened this issue Dec 22, 2023 · 4 comments · May be fixed by #23
Open

Integration with dictionaries #20

frafra opened this issue Dec 22, 2023 · 4 comments · May be fixed by #23
Labels
enhancement New feature or request

Comments

@frafra
Copy link
Collaborator

frafra commented Dec 22, 2023

@mdsnor Should we set up Skosmos and use its API in the catalog, or should handle import the dictionaries? What is the use case there? Is there any common client to query vocabularies using standard interfaces?

@frafra frafra added the enhancement New feature or request label Dec 22, 2023
@nicokant
Copy link
Collaborator

Some references I found:

@frafra
Copy link
Collaborator Author

frafra commented Dec 27, 2023

I will try to clear things up based on my (limited) knowledge of the topic :)

Let's start with Skosmos, which has been what it has been suggested as starting point to serve our dictionaries :)

Skosmos provides a web interface and REST OpenAPI on top of a SPARQL endpoint serving vocabularies in the SKOS data model. The web interface is nice for the users (intuitive, multilingual support), the SPARQL is great for linked data, and the REST API is great to integrate the system from other systems (like a webpage for writing metadata with search/autocomplete functionality).
SKOS is better than RDF because it takes multiple languages into account, and conversion between RDF and SKOS can be made using Skosify.

Skosmos suggests to use Apache Fuseki to provide the SPARQL interface on top of RDF/SKOS vocabulary files, that are imported into a TDB database. Fuseki is suggested because it provides text indexing via jena-text, which is good for performances.

Ideally, we would like to store vocabularies in a regular Postgres database, handled by the NINA catalogue as a Django app, so that it would be easier to enforce consistency for references and make the system easier to handle, instead of having to maintain a whole new set of applications built with an entirely different group of frameworks, programming languages and technologies.

  • The popular rdflib Python library supports various databases, but not regular SQL databases, and there is no actively maintained rdflib data store for them
    • rdflib-endpoint provides a SPARQL interface, but it does not support SQL databases
    • there is no actively maintained rdflib data store for SQL databases, but rdflib-django3 allows storing RDF data using Django ORM
  • oxigraph seems to be even faster than Fuseki, and it is written in Rust instead of Java, but it is still in beta, and it does not use a SQL database; it can be used to make rdflib-endpoint go faster
  • ontop provides a SPARQL interface on top of SQL databases and seems well maintained, but it requires creating the mappings between the database and the RDF: I have no idea how complex how would that be, so I opened a discussion
  • Jena SDB allows using SQL database for persistency, but it in maintainace mode and it is not recommended

Here is a map with all the possible connections for the various components and interfaces:

graph TD;
oxigraph-->rocksdb;
sparql-->rdflib-endpoint-->rdflib;
rdflib-->BerkeleyDB;
rdflib-->memory;
rdflib-->oxigraph;
sparql-->oxigraph_server-->oxigraph;
sparql-->jena;
jena-->TDB;
jena-->SDB-->postgres;
rest-api-->skosmos-->sparql;
nina-catalogue-->rest-api;
sparql-->ontop-->postgres;
nina-catalogue-->rdflib-->rdflib-django3-->postgres;
Loading

This is my favourite architecture, that needs to be validated:

graph TD;
nina-catalogue-->rdflib-->rdflib-django3-->postgres;
skosmos-->sparql-->rdflib-entrypoint-->rdflib;
Loading

@nicokant
Copy link
Collaborator

nicokant commented Jan 2, 2024

Related: ontop/ontop#781

@nicokant
Copy link
Collaborator

nicokant commented Jan 3, 2024

I'll run an experiment on the proposed architecture, few notes:

@nicokant nicokant linked a pull request Jan 4, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants