Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Predict Modality task #320

Merged
merged 21 commits into from
Jan 7, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,12 @@

## NEW CONTENT

* Add Predict Modality benchmark page (PR #320).

# openproblems.bio v2.3.6

## NEW CONTENT

* Add an event page for the Weekly wednesday work meeting (PR #299).

* Add `Advanced_topics` pages to documentation (PR #300).
Expand Down
68 changes: 68 additions & 0 deletions results/predict_modality/data/dataset_info.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
[
{
"dataset_id": "openproblems_neurips2021/bmmc_cite/normal",
"dataset_name": "NeurIPS2021 CITE-Seq (GEX2ADT)",
"dataset_summary": "Single-cell CITE-Seq (GEX+ADT) data collected from bone marrow mononuclear cells of 12 healthy human donors.",
"dataset_description": "Single-cell CITE-Seq data collected from bone marrow mononuclear cells of 12 healthy human donors using the 10X 3 prime Single-Cell Gene Expression kit with Feature Barcoding in combination with the BioLegend TotalSeq B Universal Human Panel v1.0. The dataset was generated to support Multimodal Single-Cell Data Integration Challenge at NeurIPS 2021. Samples were prepared using a standard protocol at four sites. The resulting data was then annotated to identify cell types and remove doublets. The dataset was designed with a nested batch layout such that some donor samples were measured at multiple sites with some donors measured at a single site.",
"data_reference": "luecken2021neurips",
"data_url": "https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE194122",
"date_created": "25-11-2024",
"file_size": 704994,
"common_dataset_id": "openproblems_neurips2021/bmmc_cite"
},
{
"dataset_id": "openproblems_neurips2021/bmmc_multiome/normal",
"dataset_name": "NeurIPS2021 Multiome (GEX2ATAC)",
"dataset_summary": "Single-cell Multiome (GEX+ATAC) data collected from bone marrow mononuclear cells of 12 healthy human donors.",
"dataset_description": "Single-cell CITE-Seq data collected from bone marrow mononuclear cells of 12 healthy human donors using the 10X Multiome Gene Expression and Chromatin Accessibility kit. The dataset was generated to support Multimodal Single-Cell Data Integration Challenge at NeurIPS 2021. Samples were prepared using a standard protocol at four sites. The resulting data was then annotated to identify cell types and remove doublets. The dataset was designed with a nested batch layout such that some donor samples were measured at multiple sites with some donors measured at a single site.",
"data_reference": "luecken2021neurips",
"data_url": "https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE194122",
"date_created": "25-11-2024",
"file_size": 31080807,
"common_dataset_id": "openproblems_neurips2021/bmmc_multiome"
},
{
"dataset_id": "openproblems_neurips2021/bmmc_multiome/swap",
"dataset_name": "NeurIPS2021 Multiome (ATAC2GEX)",
"dataset_summary": "Single-cell Multiome (GEX+ATAC) data collected from bone marrow mononuclear cells of 12 healthy human donors.",
"dataset_description": "Single-cell CITE-Seq data collected from bone marrow mononuclear cells of 12 healthy human donors using the 10X Multiome Gene Expression and Chromatin Accessibility kit. The dataset was generated to support Multimodal Single-Cell Data Integration Challenge at NeurIPS 2021. Samples were prepared using a standard protocol at four sites. The resulting data was then annotated to identify cell types and remove doublets. The dataset was designed with a nested batch layout such that some donor samples were measured at multiple sites with some donors measured at a single site.",
"data_reference": "luecken2021neurips",
"data_url": "https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE194122",
"date_created": "25-11-2024",
"file_size": 7883109,
"common_dataset_id": "openproblems_neurips2021/bmmc_multiome"
},
{
"dataset_id": "openproblems_neurips2022/pbmc_cite/normal",
"dataset_name": "OpenProblems NeurIPS2022 CITE-Seq (GEX2ADT)",
"dataset_summary": "Single-cell CITE-Seq (GEX+ADT) data collected from bone marrow mononuclear cells of 12 healthy human donors.",
"dataset_description": "Single-cell CITE-Seq data collected from bone marrow mononuclear cells of 12 healthy human donors using the 10X 3 prime Single-Cell Gene Expression kit with Feature Barcoding in combination with the BioLegend TotalSeq B Universal Human Panel v1.0. The dataset was generated to support Multimodal Single-Cell Data Integration Challenge at NeurIPS 2022. Samples were prepared using a standard protocol at four sites. The resulting data was then annotated to identify cell types and remove doublets. The dataset was designed with a nested batch layout such that some donor samples were measured at multiple sites with some donors measured at a single site.",
"data_reference": "lance2024predicting",
"data_url": "https://www.kaggle.com/competitions/open-problems-multimodal/data",
"date_created": "25-11-2024",
"file_size": 591886,
"common_dataset_id": "openproblems_neurips2022/pbmc_cite"
},
{
"dataset_id": "openproblems_neurips2022/pbmc_cite/swap",
"dataset_name": "OpenProblems NeurIPS2022 CITE-Seq (ADT2GEX)",
"dataset_summary": "Single-cell CITE-Seq (GEX+ADT) data collected from bone marrow mononuclear cells of 12 healthy human donors.",
"dataset_description": "Single-cell CITE-Seq data collected from bone marrow mononuclear cells of 12 healthy human donors using the 10X 3 prime Single-Cell Gene Expression kit with Feature Barcoding in combination with the BioLegend TotalSeq B Universal Human Panel v1.0. The dataset was generated to support Multimodal Single-Cell Data Integration Challenge at NeurIPS 2022. Samples were prepared using a standard protocol at four sites. The resulting data was then annotated to identify cell types and remove doublets. The dataset was designed with a nested batch layout such that some donor samples were measured at multiple sites with some donors measured at a single site.",
"data_reference": "lance2024predicting",
"data_url": "https://www.kaggle.com/competitions/open-problems-multimodal/data",
"date_created": "25-11-2024",
"file_size": 32551804,
"common_dataset_id": "openproblems_neurips2022/pbmc_cite"
},
{
"dataset_id": "openproblems_neurips2021/bmmc_cite/swap",
"dataset_name": "NeurIPS2021 CITE-Seq (ADT2GEX)",
"dataset_summary": "Single-cell CITE-Seq (GEX+ADT) data collected from bone marrow mononuclear cells of 12 healthy human donors.",
"dataset_description": "Single-cell CITE-Seq data collected from bone marrow mononuclear cells of 12 healthy human donors using the 10X 3 prime Single-Cell Gene Expression kit with Feature Barcoding in combination with the BioLegend TotalSeq B Universal Human Panel v1.0. The dataset was generated to support Multimodal Single-Cell Data Integration Challenge at NeurIPS 2021. Samples were prepared using a standard protocol at four sites. The resulting data was then annotated to identify cell types and remove doublets. The dataset was designed with a nested batch layout such that some donor samples were measured at multiple sites with some donors measured at a single site.",
"data_reference": "luecken2021neurips",
"data_url": "https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE194122",
"date_created": "25-11-2024",
"file_size": 13467880,
"common_dataset_id": "openproblems_neurips2021/bmmc_cite"
}
]
130 changes: 130 additions & 0 deletions results/predict_modality/data/method_info.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
[
{
"task_id": "control_methods",
"method_id": "mean_per_gene",
"method_name": "Mean per gene",
"method_summary": "Returns the mean expression value per gene.",
"method_description": "Returns the mean expression value per gene.",
"is_baseline": true,
"references_doi": null,
"references_bibtex": null,
"code_url": "https://github.com/openproblems-bio/task_predict_modality",
"documentation_url": null,
"image": "https://ghcr.io/openproblems-bio/task_predict_modality/control_methods/mean_per_gene:build_main",
"implementation_url": "https://github.com/openproblems-bio/task_predict_modality/blob/0bd597e201b39fbcbc1fcd7047f7654a9713a197/src/control_methods/mean_per_gene",
"code_version": "build_main",
"commit_sha": "0bd597e201b39fbcbc1fcd7047f7654a9713a197"
},
{
"task_id": "control_methods",
"method_id": "random_predict",
"method_name": "Random predictions",
"method_summary": "Returns random training profiles.",
"method_description": "Returns random training profiles.",
"is_baseline": true,
"references_doi": null,
"references_bibtex": null,
"code_url": "https://github.com/openproblems-bio/task_predict_modality",
"documentation_url": null,
"image": "https://ghcr.io/openproblems-bio/task_predict_modality/control_methods/random_predict:build_main",
"implementation_url": "https://github.com/openproblems-bio/task_predict_modality/blob/0bd597e201b39fbcbc1fcd7047f7654a9713a197/src/control_methods/random_predict",
"code_version": "build_main",
"commit_sha": "0bd597e201b39fbcbc1fcd7047f7654a9713a197"
},
{
"task_id": "control_methods",
"method_id": "zeros",
"method_name": "Zeros",
"method_summary": "Returns a prediction consisting of all zeros.",
"method_description": "Returns a prediction consisting of all zeros.",
"is_baseline": true,
"references_doi": null,
"references_bibtex": null,
"code_url": "https://github.com/openproblems-bio/task_predict_modality",
"documentation_url": null,
"image": "https://ghcr.io/openproblems-bio/task_predict_modality/control_methods/zeros:build_main",
"implementation_url": "https://github.com/openproblems-bio/task_predict_modality/blob/0bd597e201b39fbcbc1fcd7047f7654a9713a197/src/control_methods/zeros",
"code_version": "build_main",
"commit_sha": "0bd597e201b39fbcbc1fcd7047f7654a9713a197"
},
{
"task_id": "control_methods",
"method_id": "solution",
"method_name": "Solution",
"method_summary": "Returns the ground-truth solution.",
"method_description": "Returns the ground-truth solution.",
"is_baseline": true,
"references_doi": null,
"references_bibtex": null,
"code_url": "https://github.com/openproblems-bio/task_predict_modality",
"documentation_url": null,
"image": "https://ghcr.io/openproblems-bio/task_predict_modality/control_methods/solution:build_main",
"implementation_url": "https://github.com/openproblems-bio/task_predict_modality/blob/0bd597e201b39fbcbc1fcd7047f7654a9713a197/src/control_methods/solution",
"code_version": "build_main",
"commit_sha": "0bd597e201b39fbcbc1fcd7047f7654a9713a197"
},
{
"task_id": "methods",
"method_id": "knnr_py",
"method_name": "KNNR (Py)",
"method_summary": "K-nearest neighbor regression in Python.",
"method_description": "K-nearest neighbor regression in Python.",
"is_baseline": false,
"references_doi": "10.2307/1403797",
"references_bibtex": null,
"code_url": "https://github.com/openproblems-bio/task_predict_modality",
"documentation_url": null,
"image": "https://ghcr.io/openproblems-bio/task_predict_modality/methods/knnr_py:build_main",
"implementation_url": "https://github.com/openproblems-bio/task_predict_modality/blob/0bd597e201b39fbcbc1fcd7047f7654a9713a197/src/methods/knnr_py",
"code_version": "build_main",
"commit_sha": "0bd597e201b39fbcbc1fcd7047f7654a9713a197"
},
{
"task_id": "methods",
"method_id": "knnr_r",
"method_name": "KNNR (R)",
"method_summary": "K-nearest neighbor regression in R.",
"method_description": "K-nearest neighbor regression in R.",
"is_baseline": false,
"references_doi": "10.2307/1403797",
"references_bibtex": null,
"code_url": "https://github.com/openproblems-bio/task_predict_modality",
"documentation_url": null,
"image": "https://ghcr.io/openproblems-bio/task_predict_modality/methods/knnr_r:build_main",
"implementation_url": "https://github.com/openproblems-bio/task_predict_modality/blob/0bd597e201b39fbcbc1fcd7047f7654a9713a197/src/methods/knnr_r",
"code_version": "build_main",
"commit_sha": "0bd597e201b39fbcbc1fcd7047f7654a9713a197"
},
{
"task_id": "methods",
"method_id": "lm",
"method_name": "Linear Model",
"method_summary": "Linear model regression.",
"method_description": "A linear model regression method.",
"is_baseline": false,
"references_doi": "10.2307/2346786",
"references_bibtex": null,
"code_url": "https://github.com/openproblems-bio/task_predict_modality",
"documentation_url": null,
"image": "https://ghcr.io/openproblems-bio/task_predict_modality/methods/lm:build_main",
"implementation_url": "https://github.com/openproblems-bio/task_predict_modality/blob/0bd597e201b39fbcbc1fcd7047f7654a9713a197/src/methods/lm",
"code_version": "build_main",
"commit_sha": "0bd597e201b39fbcbc1fcd7047f7654a9713a197"
},
{
"task_id": "methods",
"method_id": "guanlab_dengkw_pm",
"method_name": "Guanlab-dengkw",
"method_summary": "A kernel ridge regression method with RBF kernel.",
"method_description": "This is a solution developed by Team Guanlab - dengkw in the Neurips 2021 competition to predict one modality\nfrom another using kernel ridge regression (KRR) with RBF kernel. Truncated SVD is applied on the combined\ntraining and test data from modality 1 followed by row-wise z-score normalization on the reduced matrix. The\ntruncated SVD of modality 2 is predicted by training a KRR model on the normalized training matrix of modality 1.\nPredictions on the normalized test matrix are then re-mapped to the modality 2 feature space via the right\nsingular vectors.\n",
"is_baseline": false,
"references_doi": "10.1101/2022.04.11.487796",
"references_bibtex": null,
"code_url": "https://github.com/openproblems-bio/task_predict_modality",
"documentation_url": null,
"image": "https://ghcr.io/openproblems-bio/task_predict_modality/methods/guanlab_dengkw_pm:build_main",
"implementation_url": "https://github.com/openproblems-bio/task_predict_modality/blob/0bd597e201b39fbcbc1fcd7047f7654a9713a197/src/methods/guanlab_dengkw_pm",
"code_version": "build_main",
"commit_sha": "0bd597e201b39fbcbc1fcd7047f7654a9713a197"
}
]
Loading
Loading