Skip to content
Justin Reese edited this page May 5, 2020 · 68 revisions

Knowledge Graph Hub concept

A Knowledge Graph Hub (KG Hub) is software to download and transform data to a central location for building knowledge graphs (KGs) from different combination of data sources, in an automated, YAML-driven way. The workflow is:

  • download data
  • transform data for each data source into two TSV files (edges.tsv and nodes.tsv) as specified here
  • load the data using KGX to produce a merged knowledge graph

To facilitate interoperability of datasets, biolink categories are added to nodes and biolink associations are added to edges during transformation.

KG-COVID-19 project

The KG-Covid-19 project is the first such KG Hub. Output is a Knowledge Graph Hub that downloads and transforms COVID-19/SARS-COV-2 and related data and emits a knowledge graph that can be loaded into KGX and used for machine learning or others uses, to produce actionable knowledge.

Download knowledge graph:

A merged knowledge graph comprised of data from all available transforms is here:

RDF

TSV

See here for a description of the KGX TSV format.

Summary of data (Apr 2020):

Summary of data ingested (as of Apr 2020)

A few organizing principles:

  • UniprotKB IDs are used for genes and proteins when possible
  • Less is more: for each data source, we ingest only the subset of data that is most relevant to the KG-Hub in question (here KG-COVID-19)
  • We avoid ingesting data from a source that isn't authoritative for the data in question (e.g. do not ingest protein interaction data from a drug database)
  • Each ingest should make an effort to add provenance data by adding a provided_by column in each edge TSV file, populated with the source of each datum

People:

The code:

  • Here is the github repo for this project.

  • Here is the github repo for N2V, an implementation of node2vec and other related graph learning methods.

Contributing:

  • Here is a more detailed description, and instructions on how to help.
Clone this wiki locally