Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement reaction ingests (Rhea, BioPAX, etc) #372

Open
cmungall opened this issue Oct 9, 2020 · 6 comments
Open

Implement reaction ingests (Rhea, BioPAX, etc) #372

cmungall opened this issue Oct 9, 2020 · 6 comments
Labels
new data source new data source we'd like to ingest

Comments

@cmungall
Copy link
Contributor

cmungall commented Oct 9, 2020

Note: this should move to a generic kghub repo, keeping here for now

Need a TSV of reaction->participant edges from various sources, in order of priority

  • Rhea cc @balhoff
  • Any BioPAX3 export (e.g. Reactome)
  • Any BioPAX2 export (e.g YeastPathways)
  • SEED cc @realmarcin
  • maybe kegg

(we also have a heuristic way of generating these from GO text descriptions but this is outside the scope of this ticket)

The fields would be:

This schema to be added to bl (biolink/biolink-model#478)

The nodes would have all the usual properties. E.g. rhea would provide a description, xrefs

maybe additional node properties like

  • is balanced: bool
  • is stereo: bool

I suggest the ingest does not try and normalize the IDs, but leaves the source ID prefixes.

Some sources may have catalysis too - add these as other edge type.

Not of direct relevance to KG-hub, but relevant to @goodb @balhoff, we will also have something like a SPARQL transform that turns this into our standard OWL representation, which can be complex, involving unions, e.g

maleate hydratase activity == 
(catalytic activity 
and has input some ((R)-malate(2-) and has stoichiometry value “1”)
and has output some (maleate(2-) and has stoichiometry value “1”)
and has output some (water and has stoichiometry value “1”))
or
(catalytic activity 
and has output some ((R)-malate(2-) and has stoichiometry value “1”)
and has input some (maleate(2-) and has stoichiometry value “1”)
and has input some (water and has stoichiometry value “1”))

This is what we would use for OWL reasoning and in GO

Note this kind of alternate levels of representation for different purposes is exactly what I am getting at in Biological Knowledge Graph Modeling Design Patterns

We can also see this akin to dosdp templating - we have a simple TSV representation and an OWL expansion

@cmungall cmungall added the new data source new data source we'd like to ingest label Oct 9, 2020
@callahantiff
Copy link
Collaborator

What about adding CHEBI?

In PheKnowLator, we have created specific triples that allow us to explicitly represent CHEBI chemicals, catalysts, and cofactors with respect to Reactome pathways. Ignacio Tripodi and I collaborated on validating this and ran some wet lab experiments that seemed to suggest this worked well when applied to a small human RNA-Seq time series toxicogenomics assay.

Also, I AM HUGE fan of exploring different KG modeling design patterns. Perhaps after the PheKnowLator manuscript we can talk more seriously about some projects in that domain.

@cmungall
Copy link
Contributor Author

yes, we should definitely add chebi. rhea and reactome already use chebi

curious - how did you go about modeling this?

wow that's amazing about validating on wet lab experiments

@justaddcoffee
Copy link
Collaborator

What about adding CHEBI?

We actually ingest CHEBI now in KG-COVID-19 (see here), although probably not as elaborately as what you describe for PheKnowLator

Also, I AM HUGE fan of exploring different KG modeling design patterns. Perhaps after the PheKnowLator manuscript we can talk more seriously about some projects in that domain.

Yes, let's discuss, post-manuscript!

@deepakunni3
Copy link
Member

Yes, we do add ChEBI to our KG.
But I am not sure if we have any sources that references ChEBI apart from ChEMBL.

@callahantiff Would love to include what you have for PheKnowLator or find ways of subsetting specific parts.

Happy to chat more on this when you are ready 👍

@callahantiff
Copy link
Collaborator

yes, we should definitely add chebi. rhea and reactome already use chebi

curious - how did you go about modeling this?

wow that's amazing about validating on wet lab experiments

Happy to discuss that. I think you might be disappointed by how simple it ended up being in the end. How can I best answer the modeling question? I can describe the edge types/data sources we used?

Small wet lab experiments, but some nonetheless. I'd love to do more. Thoroughly validating the content and relationships in a large heterogeneous KG (aside from using reasoners -- to at least cover some of the logical aspects) is a tough!

@RichardBruskiewich
Copy link

Sorry, slightly non-sequitur here, but just want to mention that a "Knowledge Beacon" was built to access Rhea. It is still quietly running on the Translator subnet at https://kba.ncats.io/beacon/rhea/. It probably didn't adequately cover Rhea but it could be a source of inspiration or a few Python code snippets (or not?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new data source new data source we'd like to ingest
Projects
None yet
Development

No branches or pull requests

5 participants