Skip to content

george-hall-ucl/dawnn_paper_resources

Repository files navigation

This repository aims to ensure reproducibility of the results in the paper Dawnn: single-cell differential abundance with neural networks. The notebook to produce the figures in the paper can be found here.

The table below shows where to locate the data and code used to generate the files read in the above notebook. Taken together, this information should allow all results to be reproduced. If you are unable to reproduce any result, please contact me at [email protected] or post an issue on this repository.

Directory name Description DOI
heart_dataset Benchmarking dataset based on heart samples.

• process_heart_cells.R - Code to generate benchmarking dataset
• heart_tissue_cells.RDS - Generated benchmarking dataset
• heart_barcodes.tsv.gz - Barcode list for raw data
• heart_genes.tsv.gz - Barcode list for raw data
• heart_expression_matrix.mtx.gz - Expression matrix for raw data
• benchmark_dataset_heart_data_type_labels.csv - Generated benchmarking dataset
10.5522/04/22601260
skin_dataset Benchmarking dataset based on skin cells.

• benchmark_dataset_skin.csv - Resulting benchmarking dataset
• simulate_skin_labels_Rscript.R - R code to generate benchmarking dataset
• simulate_skin_labels_bash.sh - Bash script to generate benchmarking dataset
• skin_data_end_pipeline_1458110522.rds - Input dataset
10.5522/04/22607236
organoid_dataset Benchmarking dataset based on organoid samples.

• process_organoid_cells_data.R Code to generate benchmarking dataset
• organoid_cells.RDS Generated benchmarking dataset
• simulate_organoid_labels_Rscript.R R code to simulate labels
• simulate_organoid_labels_bash.sh Bash script to simulate labels
• benchmark_dataset_organoid_labels.csv Generated benchmarking dataset
10.5522/04/22612576
mouse_dataset Benchmarking dataset based on mouse samples.

• process_mouse_cells.R Code to generate benchmarking dataset
• mouse_gastrulation_data_regen.RDS Generated benchmarking dataset
• simulate_mouse_labels_Rscript.R R code to simulate labels
• simulate_mouse_labels_bash.sh Bash script to simulate labels
• benchmark_dataset_mouse.csv Generated benchmarking dataset (labels)
• simulate_mouse_pc1_Rscript.R R script to simulate P(C1)s
• simulate_mouse_pc1_bash.sh Bash script to simulate P(C1)s
• benchmark_dataset_mouse_pc1s_regen.csv Generated benchmarking dataset (P(C1)s)
10.5522/04/22614004
discrete_clusters_dataset Benchmarking dataset based on simulated discrete clusters.

• cells_sim_discerete_clusters_gex_seed_*.rds Generated benchmarking datasets
• generate_test_data_discrete_clusters_sim_milo_paper.R R code to generate discrete cluster labels
• benchmark_dataset_sim_discrete_clusters.csv Generated labels
10.5522/04/22616590
linear_trajectory_dataset Benchmarking dataset based on simulated linear trajectories.

• cells_sim_linear_traj_gex_seed_*.rds Generated benchmarking datasets
• benchmark_dataset_sim_linear_traj.csv Generated labels
• generate_test_data_linear_traj_sim_milo_paper.R R code to generate linear trajectory labels
10.5522/04/22616611
branching_trajectory_dataset Benchmarking dataset based on simulated branching trajectories.

• cells_sim_branching_traj_gex_seed_*.rds Generated benchmarking datasets
• benchmark_dataset_sim_branching_traj.csv Generated labels
• generate_test_data_branching_traj_sim_milo_paper.R R code to generate branching trajectory labels
10.5522/04/22619851
dawnn_trained_model Trained neural network model needed to run Dawnn.

• final_model_dawnn_rerun.h5 - Final trained Dawnn model
10.5522/04/22241017
dawnn_model_training train_final_model_regen_seed_123_job_sub.sh - Job submission script to train final selected model

• train_nn_regen_seed_123.py - Python script to train final selected model
10.5522/04/22633606
training_set_simulation Code and resulting data when simulating Dawnn's training set.

• autogen4_code.R - Code to generate training set
• labels_df.csv - Resulting generated training set
10.5522/04/22634200
model_evaluations Code and results from evaluating different models.

• nn_model_choice.py - Code for hyperparameter optimization
• model_evaluations_structure_all_nn_results.txt - Results from neural network hyperparameter optimization
• eval_rf_svm.py - Code to evaluate random forests and support vector machines
• svm_model_evaluations.txt - Results from evaluation of support vector machines
• rf_model_evaluations.txt - Results from evaluation of random forests
10.5522/04/22634416
benchmarking_code_and_results • collect_results_all_sim_dat.R - Code to run and collect benchmarking for simulated datasets

• tpr_fdr_results_discrete_clusters_rerun.csv - Results from benchmarking on discrete clusters dataset
• tpr_fdr_results_linear_traj_rerun.csv - Results from benchmarking on linear trajectory dataset
• tpr_fdr_results_branch_traj_rerun.csv - Results from benchmarking on branching trajectory dataset
• collecting_results_mouse.sh - Code to run and collect benchmarking for mouse dataset
• tpr_fdr_results_mouse_regen.csv - Results from benchmarking on mouse dataset
• collecting_results_skin.sh - Code to run and collect benchmarking for skin dataset
• tpr_fdr_results_skin_regen.csv - Results from benchmarking on skin dataset
• collecting_results_organoid.sh - Code to run and collect benchmarking for organoid dataset
• tpr_fdr_results_organoid_regen.csv - Results from benchmarking on organoid dataset
• collecting_results_heart.sh - Code to run and collect benchmarking for heart dataset
• tpr_fdr_results_heart_regen.csv - Results from benchmarking on heart dataset
• benchmarking_liver_cirrhosis_analysis.R - Code to run methods on cirrhotic liver dataset
• liver_cirrhosis_results_rerun.csv - Results from running on cirrhotic liver dataset
10.5522/04/22634470

About

Resources to reproduce results from Dawnn paper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published