Skip to content

Alternative Graph Backend deployment

Alyssa Dai edited this page Sep 12, 2023 · 7 revisions

We are moving away from Stardog as a graph backend, mostly because they no longer provide a free academic license but instead provide short-term "trials".

Take a look at https://github.com/neurobagel/planning/issues/9 to see our progress in picking a replacement.

In the meantime, here are instructions for deploying graphDB as our graph backend instead of Stardog.

Configure the environment variables

Follow the Launch the API section of our public docs, but change the following variables in the .env file from the defaults described in the docs:

NB_GRAPH_IMG=ontotext/graphdb:10.3.1
NB_GRAPH_ROOT_CONT=/opt/graphdb/home
NB_GRAPH_PORT=7200
NB_GRAPH_PORT_HOST=7200
NB_GRAPH_DB=repositories/my_db  # NOTE: for graphDB, this value should always take the the format of: repositories/<your_database_name>

Make a copy of the default docker-compose.yml file in the same directory and then run docker compose up -d to launch the Neurobagel services.

Refer to the API readme for additional instructions.

First time setup commands

When the API, graph, and query tool have been started and are running for the first time, you will have to do some first-run configuration.

Setup security and users

Also refer to https://graphdb.ontotext.com/documentation/10.0/devhub/rest-api/curl-commands.html#security-management

First, change the password for the admin user that has been automatically created by graphDB:

curl -X PATCH --header 'Content-Type: application/json' http://localhost:7200/rest/security/users/admin -d '
{"password": "NewAdminPassword"}'

make sure to replace "NewAdminPassword" with your own, secure password.

Next, enable graphDB security to only allow authenticated users access:

curl -X POST --header 'Content-Type: application/json' -d true http://localhost:7200/rest/security

and confirm that this was successful:

➜ curl -X POST http://localhost:7200/rest/security                                                  
Unauthorized (HTTP status 401)

Now we can create a user for the API:

curl -X POST --header 'Content-Type: application/json' -u "admin:newpassword" -d '
{
  "username": "DBUSER",
  "password": "DBPASSWORD"
}' http://localhost:7200/rest/security/users/DBUSER

Create a graph database

In graphDB, graph databases are called resources. To create a new one, you will also have to prepare a data-config.ttl file that contains the settings for the resource you will create (see the graphDB docs).

make sure to that the value for rep:repositoryID in the data-configl.ttl file matches the value of NB_GRAPH_DB in your .env file. For example, if NB_GRAPH_DB=my_db, then rep:repositoryID "my_db" ;.

You can use this example file and save it as data-config.ttl locally:

#
# RDF4J configuration template for a GraphDB repository
#
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix rep: <http://www.openrdf.org/config/repository#>.
@prefix sr: <http://www.openrdf.org/config/repository/sail#>.
@prefix sail: <http://www.openrdf.org/config/sail#>.
@prefix graphdb: <http://www.ontotext.com/config/graphdb#>.

[] a rep:Repository ;
    rep:repositoryID "my_db" ;
    rdfs:label "" ;
    rep:repositoryImpl [
        rep:repositoryType "graphdb:SailRepository" ;
        sr:sailImpl [
            sail:sailType "graphdb:Sail" ;

            graphdb:read-only "false" ;

            # Inference and Validation
            graphdb:ruleset "rdfsplus-optimized" ;
            graphdb:disable-sameAs "true" ;
            graphdb:check-for-inconsistencies "false" ;

            # Indexing
            graphdb:entity-id-size "32" ;
            graphdb:enable-context-index "false" ;
            graphdb:enablePredicateList "true" ;
            graphdb:enable-fts-index "false" ;
            graphdb:fts-indexes ("default" "iri") ;
            graphdb:fts-string-literals-index "default" ;
            graphdb:fts-iris-index "none" ;

            # Queries and Updates
            graphdb:query-timeout "0" ;
            graphdb:throw-QueryEvaluationException-on-timeout "false" ;
            graphdb:query-limit-results "0" ;

            # Settable in the file but otherwise hidden in the UI and in the RDF4J console
            graphdb:base-URL "http://example.org/owlim#" ;
            graphdb:defaultNS "" ;
            graphdb:imports "" ;
            graphdb:repository-type "file-repository" ;
            graphdb:storage-folder "storage" ;
            graphdb:entity-index-size "10000000" ;
            graphdb:in-memory-literal-properties "true" ;
            graphdb:enable-literal-index "true" ;
        ]
    ].

Then you can create a new graph db with the following command (replace "my_db" as needed):

curl -X PUT -u "admin:newpassword" http://localhost:7200/repositories/my_db --data-binary "@data-config.ttl" -H "Content-Type: application/x-turtle"

and add give our user access permission to the new resource:

curl -X PUT --header 'Content-Type: application/json' -d '
{"grantedAuthorities": ["WRITE_REPO_my_db","READ_REPO_my_db"]}'  http://localhost:7200/rest/security/users/DBUSER -u "admin:newpassword"
  • "WRITE_REPO_my_db": Grants write permission.
  • "READ_REPO_my_db": Grants read permission.

Note: make sure you replace my_db with the name of the graph db you have just created.

Upload test data to the graph

To test that the above setup steps worked correctly, we can add some example graph-ready data (JSONLD files) to the new graph db from the neurobagel/neurobagel_examples repository.

First, clone neurobagel/neurobagel_examples:

git clone https://github.com/neurobagel/neurobagel_examples.git

The neurobagel/api repo comes with a helper script add_data_to_graph.sh to automatically upload all JSONLD files in a directory to a user-specified graph database, with the option to clear the existing data in the database first. A version of this script for a GraphDB endpoint is available from here.

Download the add_data_to_graph_graphdb.sh script:

git clone https://gist.github.com/e10d0ba1d8e89d1564b7029b386e6637.git

To view all the command line arguments for the script:

./add_data_to_graph_graphdb.sh --help

ℹ️ Note: If you prefer to directly use curl requests to modify the graph database instead of the helper script

Add a single dataset to the graph database (example):

curl -u "<USERNAME>: <PASSWORD>" -i -X POST http://localhost:7200/repositories/<DATABASE_NAME>/statements \
    -H "Content-Type: application/ld+json" \
    --data-binary @<DATASET_NAME>.jsonld

Clear all data in the graph database (example):

curl -u "<USERNAME>: <PASSWORD>" -X POST http://localhost:7200/repositories/<DATABASE_NAME>/statements \
    -H "Content-Type: application/sparql-update" \
    --data-binary "DELETE { ?s ?p ?o } WHERE { ?s ?p ?o }"

Now, we will upload to the graph db we created above the data in the directory neurobagel_examples/data-upload/pheno-bids-output. To do this, run the helper script as follows:

./add_data_to_graph_graphdb.sh PATH/TO/neurobagel_examples/data-upload/pheno-bids-output localhost:7200 repositories/my_db DBUSER DBPASSWORD \
  --clear-data

NOTE: Here we added the --clear-data flag to remove any existing data in the database (if the database is empty, the flag has no effect). You can choose to omit the flag or explicitly specify --no-clear-data (default behaviour) to skip this step.