Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Batch Integration results #324

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# openproblems.bio unreleased

## MAJOR CHANGES

* Update Batch Integration task to OpenProblems v2 results (PR #324)
KaiWaldrant marked this conversation as resolved.
Show resolved Hide resolved

# openproblems.bio v2.3.6

## NEW CONTENT

* Add an event page for the Weekly wednesday work meeting (PR #299).
Expand Down
82 changes: 82 additions & 0 deletions results/batch_integration/data/dataset_info.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
[
{
"task_id": "batch_integration",
"dataset_id": "openproblems_v1/immune_cells",
"dataset_name": "Human immune",
"dataset_summary": "Human immune cells dataset from the scIB benchmarks",
"data_reference": "luecken2022benchmarking",
"data_url": "https://theislab.github.io/scib-reproducibility/dataset_immune_cell_hum.html"
},
{
"task_id": "batch_integration",
"dataset_id": "openproblems_v1/pancreas",
"dataset_name": "Human pancreas",
"dataset_summary": "Human pancreas cells dataset from the scIB benchmarks",
"data_reference": "luecken2022benchmarking",
"data_url": "https://theislab.github.io/scib-reproducibility/dataset_pancreas.html"
},
{
"task_id": "batch_integration",
"dataset_id": "openproblems_v1/cengen",
"dataset_name": "CeNGEN",
"dataset_summary": "Complete Gene Expression Map of an Entire Nervous System",
"data_reference": "hammarlund2018cengen",
"data_url": "https://www.cengen.org"
},
{
"task_id": "batch_integration",
"dataset_id": "cellxgene_census/gtex_v9",
"dataset_name": "GTEX v9",
"dataset_summary": "Single-nucleus cross-tissue molecular reference maps to decipher disease gene function",
"data_reference": "eraslan2022singlenucleus",
"data_url": "https://cellxgene.cziscience.com/collections/a3ffde6c-7ad2-498a-903c-d58e732f7470"
},
{
"task_id": "batch_integration",
"dataset_id": "cellxgene_census/mouse_pancreas_atlas",
"dataset_name": "Mouse Pancreatic Islet Atlas",
"dataset_summary": "Mouse pancreatic islet scRNA-seq atlas across sexes, ages, and stress conditions including diabetes",
"data_reference": "hrovatin2023delineating",
"data_url": "https://cellxgene.cziscience.com/collections/296237e2-393d-4e31-b590-b03f74ac5070"
},
{
"task_id": "batch_integration",
"dataset_id": "cellxgene_census/hypomap",
"dataset_name": "HypoMap",
"dataset_summary": "A unified single cell gene expression atlas of the murine hypothalamus",
"data_reference": "steuernagel2022hypomap",
"data_url": "https://cellxgene.cziscience.com/collections/d86517f0-fa7e-4266-b82e-a521350d6d36"
},
{
"task_id": "batch_integration",
"dataset_id": "cellxgene_census/immune_cell_atlas",
"dataset_name": "Immune Cell Atlas",
"dataset_summary": "Cross-tissue immune cell analysis reveals tissue-specific features in humans",
"data_reference": "dominguez2022crosstissue",
"data_url": "https://cellxgene.cziscience.com/collections/62ef75e4-cbea-454e-a0ce-998ec40223d3"
},
{
"task_id": "batch_integration",
"dataset_id": "cellxgene_census/tabula_sapiens",
"dataset_name": "Tabula Sapiens",
"dataset_summary": "A multiple-organ, single-cell transcriptomic atlas of humans",
"data_reference": "consortium2022tabula",
"data_url": "https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5"
},
{
"task_id": "batch_integration",
"dataset_id": "openproblems_v1/zebrafish",
"dataset_name": "Zebrafish embryonic cells",
"dataset_summary": "Single-cell mRNA sequencing of zebrafish embryonic cells.",
"data_reference": "wagner2018single",
"data_url": "https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE112294"
},
{
"task_id": "batch_integration",
"dataset_id": "cellxgene_census/dkd",
"dataset_name": "Diabetic Kidney Disease",
"dataset_summary": "Multimodal single cell sequencing implicates chromatin accessibility and genetic background in diabetic kidney disease progression",
"data_reference": "wilson2022multimodal",
"data_url": "https://cellxgene.cziscience.com/collections/b3e2c6e3-9b05-4da9-8f42-da38a664b45b"
}
]
218 changes: 218 additions & 0 deletions results/batch_integration/data/method_info.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,218 @@
[
{
"task_id": "batch_integration",
"method_id": "bbknn",
"method_name": "BBKNN",
"method_summary": "BBKNN creates k nearest neighbours graph by identifying neighbours within batches, then combining and processing them with UMAP for visualization.",
"is_baseline": false,
"paper_reference": "polanski2020bbknn",
"code_url": "https://github.com/Teichlab/bbknn",
"implementation_url": "https://github.com/openproblems-bio/openproblems-v2/tree/2ebb7c01db18f3e3498c4d144020a7e6f4ce0f1a/src/tasks/batch_integration/methods/bbknn/config.vsh.yaml",
"code_version": null,
"commit_sha": "2ebb7c01db18f3e3498c4d144020a7e6f4ce0f1a"
},
{
"task_id": "batch_integration",
"method_id": "combat",
"method_name": "Combat",
"method_summary": "Adjusting batch effects in microarray expression data using empirical Bayes methods",
"is_baseline": false,
"paper_reference": "hansen2012removing",
"code_url": "https://scanpy.readthedocs.io/en/stable/api/scanpy.pp.combat.html",
"implementation_url": "https://github.com/openproblems-bio/openproblems-v2/tree/2ebb7c01db18f3e3498c4d144020a7e6f4ce0f1a/src/tasks/batch_integration/methods/combat/config.vsh.yaml",
"code_version": null,
"commit_sha": "2ebb7c01db18f3e3498c4d144020a7e6f4ce0f1a"
},
{
"task_id": "batch_integration",
"method_id": "fastmnn_embedding",
"method_name": "fastMnn (embedding)",
"method_summary": "A simpler version of the original mnnCorrect algorithm.",
"is_baseline": false,
"paper_reference": "haghverdi2018batch",
"code_url": "https://code.bioconductor.org/browse/batchelor/",
"implementation_url": "https://github.com/openproblems-bio/openproblems-v2/tree/2ebb7c01db18f3e3498c4d144020a7e6f4ce0f1a/src/tasks/batch_integration/methods/fastmnn_embedding/config.vsh.yaml",
"code_version": null,
"commit_sha": "2ebb7c01db18f3e3498c4d144020a7e6f4ce0f1a"
},
{
"task_id": "batch_integration",
"method_id": "fastmnn_feature",
"method_name": "fastMnn (feature)",
"method_summary": "A simpler version of the original mnnCorrect algorithm.",
"is_baseline": false,
"paper_reference": "haghverdi2018batch",
"code_url": "https://code.bioconductor.org/browse/batchelor/",
"implementation_url": "https://github.com/openproblems-bio/openproblems-v2/tree/2ebb7c01db18f3e3498c4d144020a7e6f4ce0f1a/src/tasks/batch_integration/methods/fastmnn_feature/config.vsh.yaml",
"code_version": null,
"commit_sha": "2ebb7c01db18f3e3498c4d144020a7e6f4ce0f1a"
},
{
"task_id": "batch_integration",
"method_id": "liger",
"method_name": "LIGER",
"method_summary": "Linked Inference of Genomic Experimental Relationships",
"is_baseline": false,
"paper_reference": "welch2019single",
"code_url": "https://github.com/welch-lab/liger",
"implementation_url": "https://github.com/openproblems-bio/openproblems-v2/tree/2ebb7c01db18f3e3498c4d144020a7e6f4ce0f1a/src/tasks/batch_integration/methods/liger/config.vsh.yaml",
"code_version": null,
"commit_sha": "2ebb7c01db18f3e3498c4d144020a7e6f4ce0f1a"
},
{
"task_id": "batch_integration",
"method_id": "mnn_correct",
"method_name": "mnnCorrect",
"method_summary": "Correct for batch effects in single-cell expression data using the mutual nearest neighbors method.",
"is_baseline": false,
"paper_reference": "haghverdi2018batch",
"code_url": "https://code.bioconductor.org/browse/batchelor/",
"implementation_url": "https://github.com/openproblems-bio/openproblems-v2/tree/2ebb7c01db18f3e3498c4d144020a7e6f4ce0f1a/src/tasks/batch_integration/methods/mnn_correct/config.vsh.yaml",
"code_version": null,
"commit_sha": "2ebb7c01db18f3e3498c4d144020a7e6f4ce0f1a"
},
{
"task_id": "batch_integration",
"method_id": "mnnpy",
"method_name": "mnnpy",
"method_summary": "Batch effect correction by matching mutual nearest neighbors, Python implementation.",
"is_baseline": false,
"paper_reference": "hie2019efficient",
"code_url": "https://github.com/chriscainx/mnnpy",
"implementation_url": "https://github.com/openproblems-bio/openproblems-v2/tree/2ebb7c01db18f3e3498c4d144020a7e6f4ce0f1a/src/tasks/batch_integration/methods/mnnpy/config.vsh.yaml",
"code_version": null,
"commit_sha": "2ebb7c01db18f3e3498c4d144020a7e6f4ce0f1a"
},
{
"task_id": "batch_integration",
"method_id": "pyliger",
"method_name": "pyliger",
"method_summary": "Python implementation of LIGER (Linked Inference of Genomic Experimental Relationships",
"is_baseline": false,
"paper_reference": "welch2019single",
"code_url": "https://github.com/welch-lab/pyliger",
"implementation_url": "https://github.com/openproblems-bio/openproblems-v2/tree/2ebb7c01db18f3e3498c4d144020a7e6f4ce0f1a/src/tasks/batch_integration/methods/pyliger/config.vsh.yaml",
"code_version": null,
"commit_sha": "2ebb7c01db18f3e3498c4d144020a7e6f4ce0f1a"
},
{
"task_id": "batch_integration",
"method_id": "scalex_embed",
"method_name": "SCALEX (embedding)",
"method_summary": "Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space",
"is_baseline": false,
"paper_reference": "xiong2021online",
"code_url": "https://github.com/jsxlei/SCALEX",
"implementation_url": "https://github.com/openproblems-bio/openproblems-v2/tree/2ebb7c01db18f3e3498c4d144020a7e6f4ce0f1a/src/tasks/batch_integration/methods/scalex_embed/config.vsh.yaml",
"code_version": null,
"commit_sha": "2ebb7c01db18f3e3498c4d144020a7e6f4ce0f1a"
},
{
"task_id": "batch_integration",
"method_id": "scalex_feature",
"method_name": "SCALEX (feature)",
"method_summary": "Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space",
"is_baseline": false,
"paper_reference": "xiong2021online",
"code_url": "https://github.com/jsxlei/SCALEX",
"implementation_url": "https://github.com/openproblems-bio/openproblems-v2/tree/2ebb7c01db18f3e3498c4d144020a7e6f4ce0f1a/src/tasks/batch_integration/methods/scalex_feature/config.vsh.yaml",
"code_version": null,
"commit_sha": "2ebb7c01db18f3e3498c4d144020a7e6f4ce0f1a"
},
{
"task_id": "batch_integration",
"method_id": "scanorama_embed",
"method_name": "Scanorama (embedding)",
"method_summary": "Efficient integration of heterogeneous single-cell transcriptomes using Scanorama",
"is_baseline": false,
"paper_reference": "hie2019efficient",
"code_url": "https://github.com/brianhie/scanorama",
"implementation_url": "https://github.com/openproblems-bio/openproblems-v2/tree/2ebb7c01db18f3e3498c4d144020a7e6f4ce0f1a/src/tasks/batch_integration/methods/scanorama_embed/config.vsh.yaml",
"code_version": null,
"commit_sha": "2ebb7c01db18f3e3498c4d144020a7e6f4ce0f1a"
},
{
"task_id": "batch_integration",
"method_id": "scanorama_feature",
"method_name": "Scanorama (feature)",
"method_summary": "Efficient integration of heterogeneous single-cell transcriptomes using Scanorama",
"is_baseline": false,
"paper_reference": "hie2019efficient",
"code_url": "https://github.com/brianhie/scanorama",
"implementation_url": "https://github.com/openproblems-bio/openproblems-v2/tree/2ebb7c01db18f3e3498c4d144020a7e6f4ce0f1a/src/tasks/batch_integration/methods/scanorama_feature/config.vsh.yaml",
"code_version": null,
"commit_sha": "2ebb7c01db18f3e3498c4d144020a7e6f4ce0f1a"
},
{
"task_id": "batch_integration",
"method_id": "scanvi",
"method_name": "ScanVI",
"method_summary": "ScanVI is a deep learning method that considers cell type labels.",
"is_baseline": false,
"paper_reference": "lopez2018deep",
"code_url": "https://github.com/YosefLab/scvi-tools",
"implementation_url": "https://github.com/openproblems-bio/openproblems-v2/tree/2ebb7c01db18f3e3498c4d144020a7e6f4ce0f1a/src/tasks/batch_integration/methods/scanvi/config.vsh.yaml",
"code_version": null,
"commit_sha": "2ebb7c01db18f3e3498c4d144020a7e6f4ce0f1a"
},
{
"task_id": "batch_integration",
"method_id": "scvi",
"method_name": "scVI",
"method_summary": "scVI combines a variational autoencoder with a hierarchical Bayesian model.",
"is_baseline": false,
"paper_reference": "lopez2018deep",
"code_url": "https://github.com/YosefLab/scvi-tools",
"implementation_url": "https://github.com/openproblems-bio/openproblems-v2/tree/2ebb7c01db18f3e3498c4d144020a7e6f4ce0f1a/src/tasks/batch_integration/methods/scvi/config.vsh.yaml",
"code_version": null,
"commit_sha": "2ebb7c01db18f3e3498c4d144020a7e6f4ce0f1a"
},
{
"task_id": "batch_integration",
"method_id": "no_integration_batch",
"method_name": "No integration by Batch",
"method_summary": "Cells are embedded by computing PCA independently on each batch",
"is_baseline": true,
"paper_reference": null,
"code_url": null,
"implementation_url": "https://github.com/openproblems-bio/openproblems-v2/tree/2ebb7c01db18f3e3498c4d144020a7e6f4ce0f1a/src/tasks/batch_integration/control_methods/no_integration_batch/config.vsh.yaml",
"code_version": null,
"commit_sha": "2ebb7c01db18f3e3498c4d144020a7e6f4ce0f1a"
},
{
"task_id": "batch_integration",
"method_id": "random_embed_cell",
"method_name": "Random Embedding by Celltype",
"method_summary": "Cells are embedded as a one-hot encoding of celltype labels",
"is_baseline": true,
"paper_reference": null,
"code_url": null,
"implementation_url": "https://github.com/openproblems-bio/openproblems-v2/tree/2ebb7c01db18f3e3498c4d144020a7e6f4ce0f1a/src/tasks/batch_integration/control_methods/random_embed_cell/config.vsh.yaml",
"code_version": null,
"commit_sha": "2ebb7c01db18f3e3498c4d144020a7e6f4ce0f1a"
},
{
"task_id": "batch_integration",
"method_id": "random_embed_cell_jitter",
"method_name": "Random Embedding by Celltype with jitter",
"method_summary": "Cells are embedded as a one-hot encoding of celltype labels, with a small amount of random noise added to the embedding",
"is_baseline": true,
"paper_reference": null,
"code_url": null,
"implementation_url": "https://github.com/openproblems-bio/openproblems-v2/tree/2ebb7c01db18f3e3498c4d144020a7e6f4ce0f1a/src/tasks/batch_integration/control_methods/random_embed_cell_jitter/config.vsh.yaml",
"code_version": null,
"commit_sha": "2ebb7c01db18f3e3498c4d144020a7e6f4ce0f1a"
},
{
"task_id": "batch_integration",
"method_id": "random_integration",
"method_name": "Random integration",
"method_summary": "Feature values, embedding coordinates, and graph connectivity are all randomly permuted.",
"is_baseline": true,
"paper_reference": null,
"code_url": null,
"implementation_url": "https://github.com/openproblems-bio/openproblems-v2/tree/2ebb7c01db18f3e3498c4d144020a7e6f4ce0f1a/src/tasks/batch_integration/control_methods/random_integration/config.vsh.yaml",
"code_version": null,
"commit_sha": "2ebb7c01db18f3e3498c4d144020a7e6f4ce0f1a"
}
]
Loading