This project aims to make predictions of CERES based on functional and genomic features.
The file structure is as follows,
.
├── data # where raw data is stored
├── notebooks # jupyter/Rmd notebooks for analyses
├── out # outputs of analyses
├── src # source codes
│ ├── ceres_infer # source codes for ceres_infer package
│ ├── cwl # CWL files for deriving lossy gene sets
│ ├── figures.R # R script for generating figures in manuscript
│ ├── figures.py # python script for generating figures in manuscript
│ ├── data_source.py # python script consolidate all data source csv's into a single Excel
├── README.md
├── LICENSE # License info
├── environment.yml # conda environment
├── requirements.txt # package requirements
├── setup.py # build script for ceres_infer package
After cloning the repository, run the following commands to set up a new virtual environment and install the required packages.
conda env create -f environment.yml
conda activate cga_app
pip install -Ue . # installs the ceres_infer package
The repo here contains all notebooks and source codes needed to run the pipeline, build models, and analyze results.
The raw data need to be downloaded separately. For DepMap, the 19Q3 and 19Q4 releases were downloaded from the DepMap Portal (https://depmap.org/portal/download/). These were placed into the data/DepMap/19Q3/
and data/DepMap/19Q4/
folders.
For gene sets, Panther gene sets were downloaded from Enrichr libaries and paralog gene lists were retrieved from Ensembl gene trees using biomaRt in R. These gene sets were placed into the data/gene_sets/
folder.
Intermediate and final processed files and models are too big for storage on github. They are available upon request.
Refer to run01-preprocess_data.ipynb
in notebooks for preprocessing of data.
The pipeline for model building and inference can be accessed with the worfklow
class. Before calling this method, the data need to be preprocessed (see above), and the parameters need to be defined (see below).
from ceres_infer.session import workflow
params = {} # Define all the params here
wf = workflow(params)
wf.run_pipe()
Refer to run02-template.ipynb
in notebooks for templates. Including customizing the pipeline.
Additional analyses are in the notebooks
folder.
Codes for the generation of all figures in manuscript are in src/figures.py
and src/figures.R
. Data source Excel file is generated with src/data_source.py
.