You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a bioinformaticians trying to find data relevant to planned analysis, I want to retrieve pre-generated, CxG standard h5ad files based on any combination of - publication DOI, sample tissue, sample developmental stage, assay, cell type... using ontology closure to return more results. The results should take the form of a CSV table with metadata (Citation, tissue(s), stage(s) etc) plus link to h5ad file for download
the query should support combinations at the sample/annotation level not merely at the dataset level e.g. a query for cell_type: fibroblast; tissue: Kidney should return datasets with cells annotated to types of fibroblast and parts of kidney, not just datasets that have these cell types and tissues, but not combined.
Draft tech spec:
KG - already has a queryable neo4j graph of cell sets linked to ontology terms and dataset nodes & a SOLR endpoint with all nodes indexed.
Query strategy: SOLR instance is sufficiently denormalised that above use case can be fulfilled with 1-3 queries. Denormalizations needed: ontology closures; dataset metadata + file link? Precise schema TBD.
Knowledge Graph use cases: More discussion needed of what extended KG content will look like. Will we include an extended Graph with GO. Will we fold in GO annotations? Analysis of cell set transcriptomes => predicted GO BP an CC?
The text was updated successfully, but these errors were encountered:
MATCH (c)-[:SUBCLASSOF*0..]->(d) WHEREd.label='T cell'MATCHp=(ds)-[:has_source]-(n:Cell_cluster)-[:composed_primarily_of]->(c:Class:Cell)
RETURNdistinctn.labelasauthor_annotion, c.labelasCL_annotation, ds.download_link[0], ds.title[0], ds.publication[0]
Find all datasets that use tissue from the lung
MATCH (n:Cell_cluster)-[r:tissue]->(t)-[:SUBCLASSOF|part_of*0..]->(:Class{label:'lung'})
MATCH (ds)-[:has_source]-(n:Cell_cluster)-[:composed_primarily_of]->(c:Class:Cell)
RETURNdistinctn.labelasauthor_annotion, t.labelastissue, ds.download_link[0], ds.title[0], ds.publication[0]
We could do the same for stage, disease and organism.
:Class nodes also have curie & synonyms so we can support search on these too.
To do combinatorial queries. we can combine them.
e.g.
MATCH (n:Cell_cluster)-[r:tissue]->(t)-[:SUBCLASSOF|part_of*0..]->(:Class{label:'lung'})
MATCH (c)-[:SUBCLASSOF*0..]->(d) WHEREd.label='epithelial cell'MATCH (ds)-[:has_source]-(n:Cell_cluster)-[:composed_primarily_of]->(c:Class:Cell)
RETURNdistinctn.labelasauthor_annotion, c.labelasCL_annotation, ds.download_link[0], ds.title[0], ds.publication[0]
Draft tech spec:
KG - already has a queryable neo4j graph of cell sets linked to ontology terms and dataset nodes & a SOLR endpoint with all nodes indexed.
Query strategy: SOLR instance is sufficiently denormalised that above use case can be fulfilled with 1-3 queries. Denormalizations needed: ontology closures; dataset metadata + file link? Precise schema TBD.
The text was updated successfully, but these errors were encountered: