Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API use cases #3

Open
dosumis opened this issue Mar 5, 2024 · 1 comment
Open

API use cases #3

dosumis opened this issue Mar 5, 2024 · 1 comment

Comments

@dosumis
Copy link
Collaborator

dosumis commented Mar 5, 2024

  1. As a bioinformaticians trying to find data relevant to planned analysis, I want to retrieve pre-generated, CxG standard h5ad files based on any combination of - publication DOI, sample tissue, sample developmental stage, assay, cell type... using ontology closure to return more results. The results should take the form of a CSV table with metadata (Citation, tissue(s), stage(s) etc) plus link to h5ad file for download
  • the query should support combinations at the sample/annotation level not merely at the dataset level e.g. a query for cell_type: fibroblast; tissue: Kidney should return datasets with cells annotated to types of fibroblast and parts of kidney, not just datasets that have these cell types and tissues, but not combined.

Draft tech spec:

KG - already has a queryable neo4j graph of cell sets linked to ontology terms and dataset nodes & a SOLR endpoint with all nodes indexed.

Query strategy: SOLR instance is sufficiently denormalised that above use case can be fulfilled with 1-3 queries. Denormalizations needed: ontology closures; dataset metadata + file link? Precise schema TBD.

  1. Knowledge Graph use cases: More discussion needed of what extended KG content will look like. Will we include an extended Graph with GO. Will we fold in GO annotations? Analysis of cell set transcriptomes => predicted GO BP an CC?
@dosumis
Copy link
Collaborator Author

dosumis commented Jul 9, 2024

Query for datasets containing types of T cell

MATCH (c)-[:SUBCLASSOF*0..]->(d) WHERE d.label = 'T cell'
MATCH p=(ds)-[:has_source]-(n:Cell_cluster)-[:composed_primarily_of]->(c:Class:Cell) 
RETURN distinct  n.label as author_annotion, c.label as CL_annotation, ds.download_link[0], ds.title[0], ds.publication[0]

Find all datasets that use tissue from the lung

MATCH (n:Cell_cluster)-[r:tissue]->(t)-[:SUBCLASSOF|part_of*0..]->(:Class {label: 'lung'}) 
MATCH (ds)-[:has_source]-(n:Cell_cluster)-[:composed_primarily_of]->(c:Class:Cell) 
RETURN distinct  n.label as author_annotion, t.label as tissue, ds.download_link[0], ds.title[0], ds.publication[0]

We could do the same for stage, disease and organism.
:Class nodes also have curie & synonyms so we can support search on these too.

To do combinatorial queries. we can combine them.

e.g.

MATCH (n:Cell_cluster)-[r:tissue]->(t)-[:SUBCLASSOF|part_of*0..]->(:Class {label: 'lung'}) 
MATCH (c)-[:SUBCLASSOF*0..]->(d) WHERE d.label = 'epithelial cell'
MATCH (ds)-[:has_source]-(n:Cell_cluster)-[:composed_primarily_of]->(c:Class:Cell) 
RETURN distinct  n.label as author_annotion, c.label as CL_annotation, ds.download_link[0], ds.title[0], ds.publication[0]

This was referenced Aug 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

1 participant