Ngest is an Automated pipeline for the creation of Biomedical Knowledge Graphs from heterogeneous data sources.
-
Install conda (https://docs.conda.io/en/latest/miniconda.html)
-
clone the repo and create conda env
git clone github.com/hmartiniano/ngest.git
cd ngest
conda env create -n ngest -f env.yml
conda activate ngest
To build a KG with all the databases you need 64 GB of RAM and around 10 GB disk space.
I the root dir of the repo run:
make
This will run the snakemake workflow.
- Install docker with docker-compose plugin:
https://docs.docker.com/compose/install/
- Copy example env file to neo4j/env:
cd neo4j
cp env.example env
-
Replace username and password in env file.
-
Start neo4j:
docker compose up -d
- Run conversion script:
python ../scripts/tsv_to_neo4j ../data/finals/merged_nodes.tsv ../data/finals/merged_edges.tsv
cp nodes.csv.gz edges.csv.gz import
- Enter container
docker compose exec neo4j bash
- Import data Inside the container run:
./bin/neo4j-admin database import full --nodes /import nodes.csv.gz --edges /import/edges.csv.gz --overwrite-destination