Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update batch integration results #363

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions results/_include/_summary_figure.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@ per_metric = d3.groups(results_long, d => d.method_id)

resources = d3.groups(results_resources, d => d.method_id)
.map(([method_id, values]) => {
// Calculate the error percentages
const error_pct_oom = d3.mean(values, d => d.exit_code === 137)
const error_pct_timeout = d3.mean(values, d => d.exit_code === 143)
const error_pct_na = d3.mean(values, d => d.exit_code === 99)
Expand All @@ -100,6 +101,8 @@ resources = d3.groups(results_resources, d => d.method_id)
const mean_disk_read_mb = mean_na_rm(values.map(d => d.disk_read_mb))
const mean_disk_write_mb = mean_na_rm(values.map(d => d.disk_write_mb))
const mean_duration_sec = mean_na_rm(values.map(d => d.duration_sec))

// Return the resources
return ({
method_id,
error_pct_error,
Expand Down
62 changes: 62 additions & 0 deletions results/batch_integration/data/dataset_info.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
[
{
"dataset_id": "cellxgene_census/gtex_v9",
"dataset_name": "GTEX v9",
"dataset_summary": "Single-nucleus cross-tissue molecular reference maps to decipher disease gene function",
"dataset_description": "Understanding the function of genes and their regulation in tissue homeostasis and disease requires knowing the cellular context in which genes are expressed in tissues across the body. Single cell genomics allows the generation of detailed cellular atlases in human tissues, but most efforts are focused on single tissue types. Here, we establish a framework for profiling multiple tissues across the human body at single-cell resolution using single nucleus RNA-Seq (snRNA-seq), and apply it to 8 diverse, archived, frozen tissue types (three donors per tissue). We apply four snRNA-seq methods to each of 25 samples from 16 donors, generating a cross-tissue atlas of 209,126 nuclei profiles, and benchmark them vs. scRNA-seq of comparable fresh tissues. We use a conditional variational autoencoder (cVAE) to integrate an atlas across tissues, donors, and laboratory methods. We highlight shared and tissue-specific features of tissue-resident immune cells, identifying tissue-restricted and non-restricted resident myeloid populations. These include a cross-tissue conserved dichotomy between LYVE1- and HLA class II-expressing macrophages, and the broad presence of LAM-like macrophages across healthy tissues that is also observed in disease. For rare, monogenic muscle diseases, we identify cell types that likely underlie the neuromuscular, metabolic, and immune components of these diseases, and biological processes involved in their pathology. For common complex diseases and traits analyzed by GWAS, we identify the cell types and gene modules that potentially underlie disease mechanisms. The experimental and analytical frameworks we describe will enable the generation of large-scale studies of how cellular and molecular processes vary across individuals and populations.",
"data_reference": "eraslan2022singlenucleus",
"data_url": "https://cellxgene.cziscience.com/collections/a3ffde6c-7ad2-498a-903c-d58e732f7470",
"date_created": "20-01-2025",
"file_size": 1016272877
},
{
"dataset_id": "cellxgene_census/hypomap",
"dataset_name": "HypoMap",
"dataset_summary": "A unified single cell gene expression atlas of the murine hypothalamus",
"dataset_description": "The hypothalamus plays a key role in coordinating fundamental body functions. Despite recent progress in single-cell technologies, a unified catalogue and molecular characterization of the heterogeneous cell types and, specifically, neuronal subtypes in this brain region are still lacking. Here we present an integrated reference atlas “HypoMap” of the murine hypothalamus consisting of 384,925 cells, with the ability to incorporate new additional experiments. We validate HypoMap by comparing data collected from SmartSeq2 and bulk RNA sequencing of selected neuronal cell types with different degrees of cellular heterogeneity.",
"data_reference": "steuernagel2022hypomap",
"data_url": "https://cellxgene.cziscience.com/collections/d86517f0-fa7e-4266-b82e-a521350d6d36",
"date_created": "20-01-2025",
"file_size": "NA"
},
{
"dataset_id": "cellxgene_census/dkd",
"dataset_name": "Diabetic Kidney Disease",
"dataset_summary": "Multimodal single cell sequencing implicates chromatin accessibility and genetic background in diabetic kidney disease progression",
"dataset_description": "Multimodal single cell sequencing is a powerful tool for interrogating cell-specific changes in transcription and chromatin accessibility. We performed single nucleus RNA (snRNA-seq) and assay for transposase accessible chromatin sequencing (snATAC-seq) on human kidney cortex from donors with and without diabetic kidney disease (DKD) to identify altered signaling pathways and transcription factors associated with DKD. Both snRNA-seq and snATAC-seq had an increased proportion of VCAM1+ injured proximal tubule cells (PT_VCAM1) in DKD samples. PT_VCAM1 has a pro-inflammatory expression signature and transcription factor motif enrichment implicated NFkB signaling. We used stratified linkage disequilibrium score regression to partition heritability of kidney-function-related traits using publicly-available GWAS summary statistics. Cell-specific PT_VCAM1 peaks were enriched for heritability of chronic kidney disease (CKD), suggesting that genetic background may regulate chromatin accessibility and DKD progression. snATAC-seq found cell-specific differentially accessible regions (DAR) throughout the nephron that change accessibility in DKD and these regions were enriched for glucocorticoid receptor (GR) motifs. Changes in chromatin accessibility were associated with decreased expression of insulin receptor, increased gluconeogenesis, and decreased expression of the GR cytosolic chaperone, FKBP5, in the diabetic proximal tubule. Cleavage under targets and release using nuclease (CUT&RUN) profiling of GR binding in bulk kidney cortex and an in vitro model of the proximal tubule (RPTEC) showed that DAR co-localize with GR binding sites. CRISPRi silencing of GR response elements (GRE) in the FKBP5 gene body reduced FKBP5 expression in RPTEC, suggesting that reduced FKBP5 chromatin accessibility in DKD may alter cellular response to GR. We developed an open-source tool for single cell allele specific analysis (SALSA) to model the effect of genetic background on gene expression. Heterozygous germline single nucleotide variants (SNV) in proximal tubule ATAC peaks were associated with allele-specific chromatin accessibility and differential expression of target genes within cis-coaccessibility networks. Partitioned heritability of proximal tubule ATAC peaks with a predicted allele-specific effect was enriched for eGFR, suggesting that genetic background may modify DKD progression in a cell-specific manner.",
"data_reference": "wilson2022multimodal",
"data_url": "https://cellxgene.cziscience.com/collections/b3e2c6e3-9b05-4da9-8f42-da38a664b45b",
"date_created": "20-01-2025",
"file_size": 417716388
},
{
"dataset_id": "cellxgene_census/immune_cell_atlas",
"dataset_name": "Immune Cell Atlas",
"dataset_summary": "Cross-tissue immune cell analysis reveals tissue-specific features in humans",
"dataset_description": "Despite their crucial role in health and disease, our knowledge of immune cells within human tissues remains limited. We surveyed the immune compartment of 16 tissues from 12 adult donors by single-cell RNA sequencing and VDJ sequencing generating a dataset of ~360,000 cells. To systematically resolve immune cell heterogeneity across tissues, we developed CellTypist, a machine learning tool for rapid and precise cell type annotation. Using this approach, combined with detailed curation, we determined the tissue distribution of finely phenotyped immune cell types, revealing hitherto unappreciated tissue-specific features and clonal architecture of T and B cells. Our multitissue approach lays the foundation for identifying highly resolved immune cell types by leveraging a common reference dataset, tissue-integrated expression analysis, and antigen receptor sequencing.",
"data_reference": "dominguez2022crosstissue",
"data_url": "https://cellxgene.cziscience.com/collections/62ef75e4-cbea-454e-a0ce-998ec40223d3",
"date_created": "20-01-2025",
"file_size": "NA"
},
{
"dataset_id": "cellxgene_census/mouse_pancreas_atlas",
"dataset_name": "Mouse Pancreatic Islet Atlas",
"dataset_summary": "Mouse pancreatic islet scRNA-seq atlas across sexes, ages, and stress conditions including diabetes",
"dataset_description": "To better understand pancreatic β-cell heterogeneity we generated a mouse pancreatic islet atlas capturing a wide range of biological conditions. The atlas contains scRNA-seq datasets of over 300,000 mouse pancreatic islet cells, of which more than 100,000 are β-cells, from nine datasets with 56 samples, including two previously unpublished datasets. The samples vary in sex, age (ranging from embryonic to aged), chemical stress, and disease status (including T1D NOD model development and two T2D models, mSTZ and db/db) together with different diabetes treatments. Additional information about data fields is available in anndata uns field 'field_descriptions' and on https://github.com/theislab/mm_pancreas_atlas_rep/blob/main/resources/cellxgene.md.",
"data_reference": "hrovatin2023delineating",
"data_url": "https://cellxgene.cziscience.com/collections/296237e2-393d-4e31-b590-b03f74ac5070",
"date_created": "20-01-2025",
"file_size": "NA"
},
{
"dataset_id": "cellxgene_census/tabula_sapiens",
"dataset_name": "Tabula Sapiens",
"dataset_summary": "A multiple-organ, single-cell transcriptomic atlas of humans",
"dataset_description": "Tabula Sapiens is a benchmark, first-draft human cell atlas of nearly 500,000 cells from 24 organs of 15 normal human subjects. This work is the product of the Tabula Sapiens Consortium. Taking the organs from the same individual controls for genetic background, age, environment, and epigenetic effects and allows detailed analysis and comparison of cell types that are shared between tissues. Our work creates a detailed portrait of cell types as well as their distribution and variation in gene expression across tissues and within the endothelial, epithelial, stromal and immune compartments.",
"data_reference": "consortium2022tabula",
"data_url": "https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5",
"date_created": "20-01-2025",
"file_size": "NA"
}
]
Loading
Loading