This is the official Github repository of the paper Benchmarking a foundational cell model for post-perturbation RNAseq prediction. This is a fork of the scGPT repository.
-
notebooks/
- bulk_models.ipynb: trains RF, Elastic Net, KNN Regressor and Train Mean with GO, scGPT, scFoundation and scElmo features (Figure 1 B - E, 2 D)
- data_analysis.ipynb: runs data analysis and generates Figure 2 A - C
- Tutorial_PerturbationAdamson.ipynb: trains scGPT on the Adamson et al. dataset
- Tutorial_PerturbationNorman.ipynb: trains scGPT on the Norman et al. dataset
- Tutorial_PerturbationReplogle.ipynb: trains scGPT on the Replogle et al. (K562) dataset
- Tutorial_PerturbationReplogleRPE1.ipynb: trains scGPT on the Replogle et al. (RPE1) dataset
- embedding_eval.ipynb: embedding analysis
-
scFoundation training entry points at scFoundation/GEARS:
- train_adamson.py: trains on the Adamson et al. dataset
- train_norman.py: trains on the Norman et al. dataset
- train_replogle_rp1.py: trains on the Replogle et al. (K562) dataset
- train_replogle.py: trains on the Replogle et al. (RPE1) dataset
To reproduce the results of the paper, please follow the following steps:
-
Run
git lfs pull
to download the required data from Git Large File System. If lfs is not installed, pleaser refer to this guide -
Run
make setup
to create the conda environment, install the ipython kernel and unzip the replogle dataset -
Run scGPT trainings
- Select the scgpt_yml conda environment as the Python kernel for the notebooks
- Run the Tutorial notebooks to get the results of scGPT
-
Run scFoundation trainings
- Create the conda environment for scFoundation by running
conda create env -f scFoundation/conda.yaml
- Run
conda activate scfoundation
- Start the trainings via the entry points
- Create the conda environment for scFoundation by running
-
Run data_analysis.ipynb
-
Run bulk_models.ipynb