Skip to content

hmartiniano/ngest

Repository files navigation

ngest

Ngest is an Automated pipeline for the creation of Biomedical Knowledge Graphs from heterogeneous data sources.

Installation

  1. Install conda (https://docs.conda.io/en/latest/miniconda.html)

  2. clone the repo and create conda env

git clone github.com/hmartiniano/ngest.git
cd ngest 
conda env create -n ngest -f env.yml
conda activate ngest

Usage

To build a KG with all the databases you need 64 GB of RAM and around 10 GB disk space.

I the root dir of the repo run:

make

This will run the snakemake workflow.

Setup neo4j

  1. Install docker with docker-compose plugin:

https://docs.docker.com/compose/install/

  1. Copy example env file to neo4j/env:
cd neo4j
cp env.example env
  1. Replace username and password in env file.

  2. Start neo4j:

docker compose up -d
  1. Run conversion script:
python ../scripts/tsv_to_neo4j ../data/finals/merged_nodes.tsv ../data/finals/merged_edges.tsv
cp nodes.csv.gz edges.csv.gz import
  1. Enter container
docker compose exec neo4j bash 
  1. Import data Inside the container run:
./bin/neo4j-admin database import full --nodes /import nodes.csv.gz --edges /import/edges.csv.gz --overwrite-destination