The free Neo4j community edition (free version) doesn't support Role-based security or multiple access levels for a single server, this means we can only have either read-only and read-write access. Moreover, the RAM requirements for running GDS graph algorithms and ML workloads is much higher and more expensive than a database only used for reading graph queries.
To workaround these limitations, we have two seperate instances:
neo4j-graph-worker
: is a high-mem instance with write access, config settings and additional plugins to support ETL and ML workloads. This machine can be turned on only when it's needed for offline workloads.neo4j-graph-server
: is a t3-med instance with read-only access that is configured for serving data, it can be left running for web application use.
Both instances can synchronize by using backup and restore scripts, either from S3 or a shared EBS multi-attach storage.
Warnings:
- Read through the backup and restore scripts before running. Some lines may need to commented/uncommented depending on intended behaviour.
- This causes a ~5 min outage, since the restore step requries turning off the database
- Turn on
neo4j-graph-worker
instance - Make updates to graph with write user
- Connect to instance using ssh (
neo4j.pem
can be retrieved from AWS secrets) - Run backup script to create backup file
- (Optional) upload backup to s3
- Connect to
neo4j-graph-server
and run the restore script
If the neo4j-graph-worker
is not already created, you can create the instance from scratch:
- Create Neo4j CloudFormation stack
- Go to cloudformation in AWS console
- Click "Create stack" and select "With new resources"
- Input S3 URL to latest cloudformation yaml template (template, existing s3 url)
- Choose name: neo4j-graph-worker
- Choose password from AWS secrets
- Choose Yes to include ML plugins
- Choose instance type: t3.medium or t3.large or larger
- Choose disk size: 50-100 GBs
- Enter SSH CIDR: 0.0.0.0/0
- Choose keyname pair:
neo4j
or personal ssh key - Add tag: project: openvirome
- Add IAMRole:
AWSCloudformationFullAccessRole
- Leave other default settings
- Submit and wait for completion (~10 mins)
- Check "Outputs" tab to find URL
- (Optional) If not already created, can create a shared EBS volume or s3 bucket for storing backups.
- EBS volume:
- Existing volume:
vol-0a228a503a868b12d
- Size: 100 GiB, Type: io1 (supports Multi-attach), IOPS: 1600+, Enable Multi-attach
- Existing volume:
- S3 Bucket:
- EBS volume:
- Connect to instance using selected ssh key (
neo4j.pem
) or EC2 Instance Connect. - Install git and clone repo on machine
sudo yum update -y && sudo yum install git -y && sudo yum install make
mkdir workspace && cd workspace && git clone https://github.com/serratus-bio/virus-knowledge-graph && cd virus-knowledge-graph
- Run
make install
andmake mount-vol
to mount volume to/mnt/graphdata
- Restore data (if backup already available)
make neo4j-restore
- Or run full ETL job to populate db from scratch
- set up .env then
make etl-run
- set up .env then
- Create a new backup after changes are made
make neo4j-backup
- Useful debugging commands for mounting volumes
- Get device name from output:
lsblk
- Get filesystem and UID of device:
file -s /dev/$DEVICE_NAME
- Get memory usage:
df -h /mnt/$MOUNT_DIR
- Unmount device:
sudo umount /dev/$DEVICE_NAME
- System logs:
tail dmesg
orjournalctl -f
- Get device name from output:
- Ideally, use cloudformation to make changes to neo4j config. Alternatively, ssh into host, edit config and restart server.
- Useful commands neo4j server managment:
- Edit config file:
/etc/neo4j/neo4j.conf
- Download plugins:
/var/lib/neo4j/plugins
- Check status:
neo4j status
,sudo service neo4j status
orcurl http://localhost:7474/
- Rveiw log files:
/var/log/neo4j/
- Restart instance:
neo4j stop && neo4j start
orsudo service neo4j restart
- Edit config file: