Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prediction Qualifiers #1494

Open
riyavsinha opened this issue Apr 23, 2024 · 2 comments
Open

Prediction Qualifiers #1494

riyavsinha opened this issue Apr 23, 2024 · 2 comments
Labels

Comments

@riyavsinha
Copy link

Question:
We are looking to include predictive associations in a knowledge graph using the BioLink model. Are there currently qualifiers to specify is_predicted with a boolean value and/or predicted_by_model_type with some model (e.g. Enformer, AlphaMissense) or is there a recommended way to do so?

If not, is this within the scope of BioLink and something we can work to add, or would it be recommended to extend it independently?

@riyavsinha riyavsinha changed the title Prediction Quantifiers Prediction Qualifiers Apr 23, 2024
@sierra-moxon
Copy link
Member

Hi @riyavsinha - nice to hear from you! Thank you for the question. Yes, we just released Biolink 4.2.0 with some guidance in adding two edge properties, knowledge level and agent type to help capture the nature of the edge (whether it be a prediction, an assertion, or a statistical calculation).

Details and guidance for assigning ‘At-a-Glance’ provenance properties that allow users to make a first-pass assessment of the strength, relevance, and utility of a given Edge or Result.

Enumerated values for agent type are described in Biolink via the range of the property and include:

  • manual_agent
  • automated_agent
  • data_analysis_pipeline
  • computational_model
  • text_mining_agent
  • image_processing
  • agentmanual_validation_of_automated_agent
  • not_provided

and for ‘knowledge_level’ (which describes the level or type of statement that is reported in an edge, based on the reasoning or analysis methods used to generate the knowledge it reports, or the type/strength of evidence supporting this knowledge), enumerated values include:

  • knowledge_assertion
  • logical_entailment
  • prediction
  • statistical_association
  • observation
  • not_provided

The main challenge in applying this standard concerns selecting appropriate agent type and knowledge level terms for a given edge. Separation of agent type and knowledge level into separate properties is intended to make it easier to identify and apply the most appropriate terms for each of these provenance characteristics.

https://biolink.github.io/biolink-model/agent_type/
https://biolink.github.io/biolink-model/AgentTypeEnum/
https://biolink.github.io/biolink-model/knowledge_level/
https://biolink.github.io/biolink-model/KnowledgeLevelEnum/

Some additional guidance:

  • If a human participated in the reasoning and interpretation activities that led to creation of the knowledge statement, select ‘manual agent’
  • If a human participated only by vetting/validating a knowledge statement that was generated by an automated agent, select ‘manual validation of automated agent’.
  • It is important to indicate when such manual review has occurred, because it can give a user more confidence in an automated statement.
  • If a human was involved only in writing code/algorithms that were executed to process, analyze, or reason with data, but the knowledge statement itself was generated by software without direct human intervention, select ‘automated agent’ (or one of its children)
  • If an automated agent generating a knowledge statement executes a set of data processing and analysis tasks, and then reports the direct result of this analysis - but does NOT perform reasoning or inference to draw a broader conclusion based on these results - select ‘data analysis pipeline’
  • Data Analysis Pipelines summarize features of a dataset, or report statistical associations/enrichments within the data.
  • These agents typically generate Statements that report an ‘association’ or ‘correlation’ between variables in the dataset (e.g. ‘PM2.5 exposure is positively correlated with ER visits for Asthma, in cohort/dataset X’), or a statistical enrichment of concepts in a dataset (e.g. “Gene Set X is enriched in Pathway Y”)
  • If the automated agent performs any form of reasoning or inference over the data/information it consumes, and performs reasoning/inference over this information to draw a broader conclusion about the domain of discourse, select ‘computational model’.

With regards to specifying a specific kind of model in the edge metadata as well; if you would like to provide a list of methods, we can better help sort out which additional biolink property best holds those?

@riyavsinha
Copy link
Author

Thank you for the detailed response, that is really helpful to know, and great that BioLink supports that!

if you would like to provide a list of methods, we can better help sort out which additional biolink property best holds those?

For this, we haven't established a set list of methods yet, but in general, could be things like Enformer, AlphaMissense, Activity-by-Contact (ABC) models, ChromBPNet models, etc.

It seems like the Agent entity has a string provided_by that this string information can go in, but I'm not clear where that could be linked to in an Association?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants