Implement reaction ingests (Rhea, BioPAX, etc) #372

cmungall · 2020-10-09T19:53:18Z

Note: this should move to a generic kghub repo, keeping here for now

Need a TSV of reaction->participant edges from various sources, in order of priority

Rhea cc @balhoff
Any BioPAX3 export (e.g. Reactome)
Any BioPAX2 export (e.g YeastPathways)
SEED cc @realmarcin
maybe kegg

(we also have a heuristic way of generating these from GO text descriptions but this is outside the scope of this ticket)

The fields would be:

subject (https://w3id.org/biolink/vocab/MolecularActivity)
predicate: (todo: add has-participant to bl)
object (https://w3id.org/biolink/vocab/ChemicalSubstance)
usual provenance properties
stoichiometry: int
direction: One of l->r, r->l, bidirectional, neutral
side: One of l,r

This schema to be added to bl (biolink/biolink-model#478)

The nodes would have all the usual properties. E.g. rhea would provide a description, xrefs

maybe additional node properties like

is balanced: bool
is stereo: bool

I suggest the ingest does not try and normalize the IDs, but leaves the source ID prefixes.

Some sources may have catalysis too - add these as other edge type.

Not of direct relevance to KG-hub, but relevant to @goodb @balhoff, we will also have something like a SPARQL transform that turns this into our standard OWL representation, which can be complex, involving unions, e.g

maleate hydratase activity == 
(catalytic activity 
and has input some ((R)-malate(2-) and has stoichiometry value “1”)
and has output some (maleate(2-) and has stoichiometry value “1”)
and has output some (water and has stoichiometry value “1”))
or
(catalytic activity 
and has output some ((R)-malate(2-) and has stoichiometry value “1”)
and has input some (maleate(2-) and has stoichiometry value “1”)
and has input some (water and has stoichiometry value “1”))

This is what we would use for OWL reasoning and in GO

Note this kind of alternate levels of representation for different purposes is exactly what I am getting at in Biological Knowledge Graph Modeling Design Patterns

We can also see this akin to dosdp templating - we have a simple TSV representation and an OWL expansion

The text was updated successfully, but these errors were encountered:

callahantiff · 2020-10-13T16:51:49Z

What about adding CHEBI?

In PheKnowLator, we have created specific triples that allow us to explicitly represent CHEBI chemicals, catalysts, and cofactors with respect to Reactome pathways. Ignacio Tripodi and I collaborated on validating this and ran some wet lab experiments that seemed to suggest this worked well when applied to a small human RNA-Seq time series toxicogenomics assay.

Also, I AM HUGE fan of exploring different KG modeling design patterns. Perhaps after the PheKnowLator manuscript we can talk more seriously about some projects in that domain.

cmungall · 2020-10-14T22:57:42Z

yes, we should definitely add chebi. rhea and reactome already use chebi

curious - how did you go about modeling this?

wow that's amazing about validating on wet lab experiments

justaddcoffee · 2020-10-19T16:10:05Z

What about adding CHEBI?

We actually ingest CHEBI now in KG-COVID-19 (see here), although probably not as elaborately as what you describe for PheKnowLator

Also, I AM HUGE fan of exploring different KG modeling design patterns. Perhaps after the PheKnowLator manuscript we can talk more seriously about some projects in that domain.

Yes, let's discuss, post-manuscript!

deepakunni3 · 2020-10-19T16:41:50Z

Yes, we do add ChEBI to our KG.
But I am not sure if we have any sources that references ChEBI apart from ChEMBL.

@callahantiff Would love to include what you have for PheKnowLator or find ways of subsetting specific parts.

Happy to chat more on this when you are ready 👍

callahantiff · 2020-10-20T14:41:36Z

yes, we should definitely add chebi. rhea and reactome already use chebi

curious - how did you go about modeling this?

wow that's amazing about validating on wet lab experiments

Happy to discuss that. I think you might be disappointed by how simple it ended up being in the end. How can I best answer the modeling question? I can describe the edge types/data sources we used?

Small wet lab experiments, but some nonetheless. I'd love to do more. Thoroughly validating the content and relationships in a large heterogeneous KG (aside from using reasoners -- to at least cover some of the logical aspects) is a tough!

RichardBruskiewich · 2021-04-05T17:12:03Z

Sorry, slightly non-sequitur here, but just want to mention that a "Knowledge Beacon" was built to access Rhea. It is still quietly running on the Translator subnet at https://kba.ncats.io/beacon/rhea/. It probably didn't adequately cover Rhea but it could be a source of inspiration or a few Python code snippets (or not?)

cmungall added the new data source new data source we'd like to ingest label Oct 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement reaction ingests (Rhea, BioPAX, etc) #372

Implement reaction ingests (Rhea, BioPAX, etc) #372

cmungall commented Oct 9, 2020 •

edited

Loading

callahantiff commented Oct 13, 2020

cmungall commented Oct 14, 2020

justaddcoffee commented Oct 19, 2020

deepakunni3 commented Oct 19, 2020

callahantiff commented Oct 20, 2020

RichardBruskiewich commented Apr 5, 2021

Implement reaction ingests (Rhea, BioPAX, etc) #372

Implement reaction ingests (Rhea, BioPAX, etc) #372

Comments

cmungall commented Oct 9, 2020 • edited Loading

callahantiff commented Oct 13, 2020

cmungall commented Oct 14, 2020

justaddcoffee commented Oct 19, 2020

deepakunni3 commented Oct 19, 2020

callahantiff commented Oct 20, 2020

RichardBruskiewich commented Apr 5, 2021

cmungall commented Oct 9, 2020 •

edited

Loading