Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
jamesamcl authored Nov 5, 2024
1 parent e301632 commit 6023eef
Showing 1 changed file with 2 additions and 4 deletions.
6 changes: 2 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,16 +51,14 @@ The resulting graphs can be downloaded from https://ftp.ebi.ac.uk/pub/databases/

## Implementation

The pipeline is implemented as [Rust](https://www.rust-lang.org/) programs with simple CLIs, orchestrated with [Nextflow](https://www.nextflow.io/).

The primary output the pipeline is a [property graph](https://docs.oracle.com/en/database/oracle/property-graph/22.2/spgdg/what-are-property-graphs.html) for [Neo4j](https://github.com/neo4j/neo4j). The input format (after ingests to extract from [KGX](https://github.com/biolink/kgx), RDF, and bespoke DB formats) is simple [JSONL](https://jsonlines.org/) files, to which "bruteforce" integration is applied:
The pipeline is implemented as [Rust](https://www.rust-lang.org/) programs with simple CLIs, orchestrated with [Nextflow](https://www.nextflow.io/). Input KGs are represented in a variety of formats including [KGX](https://github.com/biolink/kgx), [RDF](https://www.w3.org/RDF/), and [JSONL](https://jsonlines.org/) files. After loading, a simple "bruteforce" integration strategy is applied:

* All strings that begin with any IRI or CURIE prefix from the [Bioregistry](https://bioregistry.io/) are canonicalised to the standard CURIE form
* All property values that are the identifier of another node in the graph become edges
* Cliques of equivalent nodes are merged into single nodes
* Cliques of equivalent properties are merged into single properties (and for ontology-defined properties, the [qualified safe labels](https://github.com/VirtualFlyBrain/neo4j2owl/blob/master/README.md) are used)

In addition to Neo4j, the nodes and edges are loaded into [Solr](https://solr.apache.org/) for full-text search and [RocksDB](https://rocksdb.org/) for id->object resolution.
The primary output of the pipeline is a [property graph](https://docs.oracle.com/en/database/oracle/property-graph/22.2/spgdg/what-are-property-graphs.html) for [Neo4j](https://github.com/neo4j/neo4j). The nodes and edges are also loaded into [Solr](https://solr.apache.org/) for full-text search and [RocksDB](https://rocksdb.org/) for id->object resolution.



0 comments on commit 6023eef

Please sign in to comment.