-
Notifications
You must be signed in to change notification settings - Fork 12
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #6370 from chanzuckerberg/staging
chore: prod deployment, dec 18th
- Loading branch information
Showing
27 changed files
with
961 additions
and
743 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,37 +1,4 @@ | ||
[ | ||
{ | ||
"tier": "maintained", | ||
"title": "Geneformer embeddings fine-tuned for CELLxGENE Census cell subclass classification", | ||
"description": "Geneformer is a foundation transformer model pretrained on a large-scale corpus of ~30 million single cell transcriptomes to enable context-aware predictions in settings with limited data in network biology.\nThese cell embeddings are derived from a Geneformer model CZI fine-tuned for cell subclass classification. As the fine-tuning procedure remains experimental and wasn’t performed by the Geneformer authors, these embeddings should not be used to assess performance of the Geneformer ", | ||
"primary_contact": { | ||
"name": "CELLxGENE Discover Team", | ||
"email": "[email protected]", | ||
"affiliation": "CZI" | ||
}, | ||
"DOI": "10.1038/s41586-023-06139-9", | ||
"publication_info": "", | ||
"publication_link": "", | ||
"project_page": "", | ||
"additional_information": "Beginning with the geneformer-12L-30M pretrained model published by Theodoris et al. (huggingface.co/ctheodoris/Geneformer), a BertForSequenceClassification model was trained to predict cell subclass (as annotated in CELLxGENE Discover see https://cellxgene.cziscience.com/collections). Embeddings were then generated using Geneformer’s EmbExtractor module with emb_layer=0.\nFor full details and a reproducible workflow please see: https://github.com/chanzuckerberg/cellxgene-census/blob/main/tools/models/geneformer/README.md", | ||
"model_link": "s3://cellxgene-contrib-public/models/geneformer/2023-12-15/homo_sapiens/fined-tuned-model/", | ||
"data_type": "obs_embedding", | ||
"obsm_layer": "geneformer", | ||
"census_version": "2023-12-15", | ||
"experiment_name": "homo_sapiens", | ||
"measurement_name": "RNA", | ||
"n_cells": 62998417, | ||
"n_columns": 512, | ||
"n_features": 512, | ||
"notebook_links": [ | ||
[ | ||
"Using trained model", | ||
"https://chanzuckerberg.github.io/cellxgene-census/notebooks/analysis_demo/comp_bio_geneformer_prediction.html" | ||
] | ||
], | ||
"submission_date": "2023-11-06", | ||
"last_updated": null, | ||
"revised_by": null | ||
}, | ||
{ | ||
"tier": "maintained", | ||
"title": "scVI integrated-embeddings with explicit modeling of batch effects", | ||
|
@@ -130,6 +97,40 @@ | |
"last_updated": null, | ||
"revised_by": null | ||
}, | ||
{ | ||
"tier": "maintained", | ||
"title": "Geneformer embeddings fine-tuned for CELLxGENE Census cell subclass classification", | ||
"description": "Geneformer is a foundation transformer model pretrained on a large-scale corpus of ~30 million single cell transcriptomes to enable context-aware predictions in settings with limited data in network biology.\nThese cell embeddings are derived from a Geneformer model CZI fine-tuned for cell subclass classification. As the fine-tuning procedure remains experimental and wasn’t performed by the Geneformer authors, these embeddings should not be used to assess performance of the pre-trained Geneformer model.", | ||
"primary_contact": { | ||
"name": "CELLxGENE Discover Team", | ||
"email": "[email protected]", | ||
"affiliation": "CZI" | ||
}, | ||
"DOI": "10.1038/s41586-023-06139-9", | ||
"publication_info": "", | ||
"publication_link": "", | ||
"project_page": "", | ||
"additional_information": "Beginning with the geneformer-12L-30M pretrained model published by Theodoris et al. (huggingface.co/ctheodoris/Geneformer), a BertForSequenceClassification model was trained to predict cell subclass (as annotated in CELLxGENE Discover see https://cellxgene.cziscience.com/collections). Embeddings were then generated using Geneformer’s EmbExtractor module with emb_layer=0.\nFor full details and a reproducible workflow please see: https://github.com/chanzuckerberg/cellxgene-census/blob/main/tools/models/geneformer/README.md", | ||
"model_link": "s3://cellxgene-contrib-public/models/geneformer/2023-12-15/homo_sapiens/fined-tuned-model/", | ||
"data_type": "obs_embedding", | ||
"obsm_layer": "geneformer", | ||
"census_version": "2023-12-15", | ||
"experiment_name": "homo_sapiens", | ||
"measurement_name": "RNA", | ||
"n_cells": 62998417, | ||
"n_columns": 512, | ||
"n_features": 512, | ||
"notebook_links": [ | ||
[ | ||
"Using trained model", | ||
"https://chanzuckerberg.github.io/cellxgene-census/notebooks/analysis_demo/comp_bio_geneformer_prediction.html" | ||
] | ||
], | ||
"submission_date": "2023-11-06", | ||
"last_updated": null, | ||
"revised_by": null | ||
}, | ||
|
||
{ | ||
"tier": "community", | ||
"title": "PINNACLE: Contextual AI Model for Single-Cell Protein Biology", | ||
|
@@ -213,12 +214,12 @@ | |
"additional_contacts": [ | ||
{ | ||
"name": "Jialong Jiang", | ||
"email": "[email protected]" , | ||
"email": "[email protected]", | ||
"affiliation": "Thomson Lab, Caltech" | ||
}, | ||
{ | ||
"name": "Yingying Gong", | ||
"email": "[email protected]" , | ||
"email": "[email protected]", | ||
"affiliation": "Thomson Lab, Caltech" | ||
} | ||
], | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -33,6 +33,23 @@ We need the following collection metadata (i.e. details associated with your pub | |
- Contact: name and email | ||
- Publication/preprint DOI: can be added later | ||
- URLs: any additional URLs for related data or resources, such as GEO or protocols.io - can be added later | ||
- Consortia: optional, and can be added later. Can be one or more of: | ||
- Allen Institute for Brain Science | ||
- BRAIN Initiative | ||
- CZ Biohub | ||
- CZI Neurodegeneration Challenge Network | ||
- CZI Single-Cell Biology | ||
- European Union’s Horizon 2020 | ||
- GenitoUrinary Development Molecular Anatomy Project (GUDMAP) | ||
- Gut Cell Atlas | ||
- Human BioMolecular Atlas Program (HuBMAP) | ||
- Human Cell Atlas (HCA) | ||
- Human Pancreas Analysis Program (HPAP) | ||
- Human Tumor Atlas Network (HTAN) | ||
- Kidney Precision Medicine Project (KPMP) | ||
- LungMAP | ||
- SEA-AD | ||
- Wellcome HCA Strategic Science Support | ||
|
||
Each dataset needs the following information added to a single h5ad (AnnData 0.8) format file: | ||
|
||
|
@@ -49,12 +66,13 @@ Each dataset needs the following information added to a single h5ad (AnnData 0.8 | |
- donor_id: free-text identifier that distinguishes the unique individual that data were derived from. It is encouraged to be something not likely to be used in other studies (e.g. donor_1 is likely to not be unique in the data corpus) | ||
- development_stage_ontology_term_id: [HsapDv](https://www.ebi.ac.uk/ols/ontologies/hsapdv) if human, [MmusDv](https://www.ebi.ac.uk/ols/ontologies/mmusdv) if mouse, `unknown` if information unavailable | ||
- sex_ontology_term_id: `PATO:0000384` for male, `PATO:0000383` for female, or `unknown` if unavailable | ||
- self_reported_ethnicity_ontology_term_id: [HANCESTRO](https://www.ebi.ac.uk/ols/ontologies/hancestro) use `multiethnic` if more than one ethnicity is reported. If human and information unavailable, use `unknown`. Use `na` if non-human. | ||
- self_reported_ethnicity_ontology_term_id: [HANCESTRO](https://www.ebi.ac.uk/ols/ontologies/hancestro) multiple comma-separated terms may be used if more than one ethnicity is reported. If human and information unavailable, use `unknown`. Use `na` if non-human. | ||
- disease_ontology_term_id: [MONDO](https://www.ebi.ac.uk/ols/ontologies/mondo) or `PATO:0000461` for 'normal' | ||
- tissue_type: `tissue`, `organoid`, or `cell culture` | ||
- tissue_ontology_term_id: [UBERON](https://www.ebi.ac.uk/ols/ontologies/uberon) | ||
- cell_type_ontology_term_id: [CL](https://www.ebi.ac.uk/ols/ontologies/cl) | ||
- assay_ontology_term_id: [EFO](https://www.ebi.ac.uk/ols/ontologies/efo) | ||
- suspension_type: `cell`, `nucleus`, or `na`, as corresponding to assay. Use [this table](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/3.1.0/schema.md#suspension_type) defined in the data schema for guidance. If the assay does not appear in this table, the most appropriate value MUST be selected and the [curation team informed](mailto:[email protected]) during submission so that the assay can be added to the table. | ||
- suspension_type: `cell`, `nucleus`, or `na`, as corresponding to assay. Use [this table](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md#suspension_type) defined in the data schema for guidance. If the assay does not appear in this table, the most appropriate value MUST be selected and the [curation team informed](mailto:[email protected]) during submission so that the assay can be added to the table. | ||
- **Embeddings in obsm**: | ||
- One or more two-dimensional embeddings, prefixed with 'X\_' | ||
- **Features in var & raw.var (if present)**: | ||
|
Oops, something went wrong.