This repository aims to ensure reproducibility of the results in the paper Dawnn: single-cell differential abundance with neural networks. The notebook to produce the figures in the paper can be found here.
The table below shows where to locate the data and code used to generate the
files read in the above notebook. Taken together, this information should allow
all results to be reproduced. If you are unable to reproduce any result, please
contact me at [email protected]
or post an issue on this repository.
Directory name | Description | DOI |
---|---|---|
heart_dataset | Benchmarking dataset based on heart samples. • process_heart_cells.R - Code to generate benchmarking dataset • heart_tissue_cells.RDS - Generated benchmarking dataset • heart_barcodes.tsv.gz - Barcode list for raw data • heart_genes.tsv.gz - Barcode list for raw data • heart_expression_matrix.mtx.gz - Expression matrix for raw data • benchmark_dataset_heart_data_type_labels.csv - Generated benchmarking dataset |
10.5522/04/22601260 |
skin_dataset | Benchmarking dataset based on skin cells. • benchmark_dataset_skin.csv - Resulting benchmarking dataset • simulate_skin_labels_Rscript.R - R code to generate benchmarking dataset • simulate_skin_labels_bash.sh - Bash script to generate benchmarking dataset • skin_data_end_pipeline_1458110522.rds - Input dataset |
10.5522/04/22607236 |
organoid_dataset | Benchmarking dataset based on organoid samples. • process_organoid_cells_data.R Code to generate benchmarking dataset • organoid_cells.RDS Generated benchmarking dataset • simulate_organoid_labels_Rscript.R R code to simulate labels • simulate_organoid_labels_bash.sh Bash script to simulate labels • benchmark_dataset_organoid_labels.csv Generated benchmarking dataset |
10.5522/04/22612576 |
mouse_dataset | Benchmarking dataset based on mouse samples. • process_mouse_cells.R Code to generate benchmarking dataset • mouse_gastrulation_data_regen.RDS Generated benchmarking dataset • simulate_mouse_labels_Rscript.R R code to simulate labels • simulate_mouse_labels_bash.sh Bash script to simulate labels • benchmark_dataset_mouse.csv Generated benchmarking dataset (labels) • simulate_mouse_pc1_Rscript.R R script to simulate P(C1)s • simulate_mouse_pc1_bash.sh Bash script to simulate P(C1)s • benchmark_dataset_mouse_pc1s_regen.csv Generated benchmarking dataset (P(C1)s) |
10.5522/04/22614004 |
discrete_clusters_dataset | Benchmarking dataset based on simulated discrete clusters. • cells_sim_discerete_clusters_gex_seed_*.rds Generated benchmarking datasets • generate_test_data_discrete_clusters_sim_milo_paper.R R code to generate discrete cluster labels • benchmark_dataset_sim_discrete_clusters.csv Generated labels |
10.5522/04/22616590 |
linear_trajectory_dataset | Benchmarking dataset based on simulated linear trajectories. • cells_sim_linear_traj_gex_seed_*.rds Generated benchmarking datasets • benchmark_dataset_sim_linear_traj.csv Generated labels • generate_test_data_linear_traj_sim_milo_paper.R R code to generate linear trajectory labels |
10.5522/04/22616611 |
branching_trajectory_dataset | Benchmarking dataset based on simulated branching trajectories. • cells_sim_branching_traj_gex_seed_*.rds Generated benchmarking datasets • benchmark_dataset_sim_branching_traj.csv Generated labels • generate_test_data_branching_traj_sim_milo_paper.R R code to generate branching trajectory labels |
10.5522/04/22619851 |
dawnn_trained_model | Trained neural network model needed to run Dawnn. • final_model_dawnn_rerun.h5 - Final trained Dawnn model |
10.5522/04/22241017 |
dawnn_model_training | train_final_model_regen_seed_123_job_sub.sh - Job submission script to train final selected model • train_nn_regen_seed_123.py - Python script to train final selected model |
10.5522/04/22633606 |
training_set_simulation | Code and resulting data when simulating Dawnn's training set. • autogen4_code.R - Code to generate training set • labels_df.csv - Resulting generated training set |
10.5522/04/22634200 |
model_evaluations | Code and results from evaluating different models. • nn_model_choice.py - Code for hyperparameter optimization • model_evaluations_structure_all_nn_results.txt - Results from neural network hyperparameter optimization • eval_rf_svm.py - Code to evaluate random forests and support vector machines • svm_model_evaluations.txt - Results from evaluation of support vector machines • rf_model_evaluations.txt - Results from evaluation of random forests |
10.5522/04/22634416 |
benchmarking_code_and_results | • collect_results_all_sim_dat.R - Code to run and collect benchmarking for simulated datasets • tpr_fdr_results_discrete_clusters_rerun.csv - Results from benchmarking on discrete clusters dataset • tpr_fdr_results_linear_traj_rerun.csv - Results from benchmarking on linear trajectory dataset • tpr_fdr_results_branch_traj_rerun.csv - Results from benchmarking on branching trajectory dataset • collecting_results_mouse.sh - Code to run and collect benchmarking for mouse dataset • tpr_fdr_results_mouse_regen.csv - Results from benchmarking on mouse dataset • collecting_results_skin.sh - Code to run and collect benchmarking for skin dataset • tpr_fdr_results_skin_regen.csv - Results from benchmarking on skin dataset • collecting_results_organoid.sh - Code to run and collect benchmarking for organoid dataset • tpr_fdr_results_organoid_regen.csv - Results from benchmarking on organoid dataset • collecting_results_heart.sh - Code to run and collect benchmarking for heart dataset • tpr_fdr_results_heart_regen.csv - Results from benchmarking on heart dataset • benchmarking_liver_cirrhosis_analysis.R - Code to run methods on cirrhotic liver dataset • liver_cirrhosis_results_rerun.csv - Results from running on cirrhotic liver dataset |
10.5522/04/22634470 |