Cell morphological representations of genes enhance prediction of drug targets

Setup

Clone the data repository: git lfs clone https://github.com/jump-cellpainting/pilot-cpjump1-data. If you do not have git lfs, refer to installation instructions.
Clone this repository: git clone https://github.com/nivedithasi/gene-embed
Symlink the CPJUMP data in gene-embed: cd gene-embed and ln -s ../pilot-cpjump1-data/ .
Create a micromamba environment: micromamba env create -n gene-embed -f env.yaml. If you do not have micromamba set up, refer to this guide.
Activate the environment micromamba activate gene-embed and install mkl: pip install mkl==2022.1.0.

Download existing experiments

cd gene-embed/code
Download gene_embed_experiment_runs.zip from Zenodo (link here).
Unzip the folder to obtain gene_embed_experiment_runs.
Move the sub-folders into gene-embed/code: mv gene_embed_experiment_runs/* gene-embed/code

Running experiments

cd gene-embed/code and activate the environment micromamba activate gene-embed.
Create a grid search by editing grid.py (edit the search space and output_dir folder path).
Create config files by running python grid.py. output_dir should contain sub-folders for each config now.
Launch the experiments: bash run.sh output_dir.
All experiment results and the trained model weights will be stored under output_dir (see test_scores.json, model.pt etc.)
Find the best model on the validation set by running python find_best_val.py. This will display the test set performance of the best val model under each experiment. Edit source_folders in this file to obtain results on additional experiments.