API use cases #3

dosumis · 2024-03-05T10:58:49Z

As a bioinformaticians trying to find data relevant to planned analysis, I want to retrieve pre-generated, CxG standard h5ad files based on any combination of - publication DOI, sample tissue, sample developmental stage, assay, cell type... using ontology closure to return more results. The results should take the form of a CSV table with metadata (Citation, tissue(s), stage(s) etc) plus link to h5ad file for download

the query should support combinations at the sample/annotation level not merely at the dataset level e.g. a query for cell_type: fibroblast; tissue: Kidney should return datasets with cells annotated to types of fibroblast and parts of kidney, not just datasets that have these cell types and tissues, but not combined.

Draft tech spec:

KG - already has a queryable neo4j graph of cell sets linked to ontology terms and dataset nodes & a SOLR endpoint with all nodes indexed.

Query strategy: SOLR instance is sufficiently denormalised that above use case can be fulfilled with 1-3 queries. Denormalizations needed: ontology closures; dataset metadata + file link? Precise schema TBD.

Knowledge Graph use cases: More discussion needed of what extended KG content will look like. Will we include an extended Graph with GO. Will we fold in GO annotations? Analysis of cell set transcriptomes => predicted GO BP an CC?

dosumis · 2024-07-09T09:22:10Z

Query for datasets containing types of T cell

MATCH (c)-[:SUBCLASSOF*0..]->(d) WHERE d.label = 'T cell'
MATCH p=(ds)-[:has_source]-(n:Cell_cluster)-[:composed_primarily_of]->(c:Class:Cell) 
RETURN distinct  n.label as author_annotion, c.label as CL_annotation, ds.download_link[0], ds.title[0], ds.publication[0]

Find all datasets that use tissue from the lung

MATCH (n:Cell_cluster)-[r:tissue]->(t)-[:SUBCLASSOF|part_of*0..]->(:Class {label: 'lung'}) 
MATCH (ds)-[:has_source]-(n:Cell_cluster)-[:composed_primarily_of]->(c:Class:Cell) 
RETURN distinct  n.label as author_annotion, t.label as tissue, ds.download_link[0], ds.title[0], ds.publication[0]

We could do the same for stage, disease and organism.
:Class nodes also have curie & synonyms so we can support search on these too.

To do combinatorial queries. we can combine them.

e.g.

MATCH (n:Cell_cluster)-[r:tissue]->(t)-[:SUBCLASSOF|part_of*0..]->(:Class {label: 'lung'}) 
MATCH (c)-[:SUBCLASSOF*0..]->(d) WHERE d.label = 'epithelial cell'
MATCH (ds)-[:has_source]-(n:Cell_cluster)-[:composed_primarily_of]->(c:Class:Cell) 
RETURN distinct  n.label as author_annotion, c.label as CL_annotation, ds.download_link[0], ds.title[0], ds.publication[0]

This was referenced Aug 9, 2024

Roadmap #13

Open

SCXA-KG API development EBISPOT/scxa_kg#4

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API use cases #3

API use cases #3

dosumis commented Mar 5, 2024 •

edited

Loading

dosumis commented Jul 9, 2024 •

edited

Loading

API use cases #3

API use cases #3

Comments

dosumis commented Mar 5, 2024 • edited Loading

dosumis commented Jul 9, 2024 • edited Loading

dosumis commented Mar 5, 2024 •

edited

Loading

dosumis commented Jul 9, 2024 •

edited

Loading