Sparse Signaling Pathway Sampling

Code related to the manuscript Inferring signaling pathways with probabilistic programming (Merrell & Gitter, 2020) Bioinformatics, 36:Supplement_2, i822–i830.

This repository contains the following:

SSPS: A method that infers relationships between variables using time series data.
- Modeling assumption: the time series data is generated by a Dynamic Bayesian Network (DBN).
- Inference strategy: MCMC sampling over possible DBN structures.
- Implementation: written in Julia, using the Gen probabilistic programming language
Analysis code:
- simulation studies;
- convergence analyses;
- evaluation on experimental data;
- a Snakefile for managing all of the analyses.

Installation and basic setup

(If you plan to reproduce all of the analyses, then make sure you're on a host with access to plenty of CPUs. Ideally, you would have access to a cluster of some sort.)

Clone this repository

git clone [email protected]:gitter-lab/ssps.git

Install Julia 1.6 (and all Julia dependencies)

Download the correct Julia binary here: https://julialang.org/downloads/.
E.g., for Linux x86_64:

$ wget https://julialang-s3.julialang.org/bin/linux/x64/1.6/julia-1.6.7-linux-x86_64.tar.gz 
$ tar -xvzf julia-1.6.7-linux-x86_64.tar.gz

Find additional installation instructions here: https://julialang.org/downloads/platform/.
Use Pkg -- Julia's package manager -- to install the project's julia dependencies:

$ cd ssps/SSPS
$ julia --project=. 
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.6.7 (2022-07-19)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using Pkg
julia> Pkg.instantiate()
julia> exit()

Reproducing the analyses

In order to reproduce the analyses, you will need some extra bits of software.

We use Snakemake -- a python package -- to manage the analysis workflow.
We use some other python packages to postprocess the results, produce plots, etc.
Some of the baseline methods are implemented in R or MATLAB.

Hence, the analyses entail some extra setup:

Install python dependencies (using conda)
- For the purposes of these instructions, we assume you have Anaconda3 or Miniconda3 installed, and have access to the conda environment manager.
  (We recommend using Miniconda; find full installation instructions here.)
- We recommend setting up a dedicated virtual environment for this project. The following will create a new environment named ssps and install the required python packages:
```
$ conda create -n ssps -c conda-forge pandas matplotlib numpy bioconda::snakemake-minimal
$ conda activate ssps
(ssps) $
```
- If you plan to reproduce the analyses on a cluster, then install cookiecutter and the complete version of snakemake
```
(ssps) $ conda install -c conda-forge cookiecutter bioconda::snakemake
```
and find the appropriate Snakemake profile from this list: https://github.com/Snakemake-Profiles/doc install the Snakemake profile using cookiecutter:
```
(ssps) $ cookiecutter https://github.com/Snakemake-Profiles/htcondor.git
```
replacing the example with the desired profile.
Install R packages
Check whether MATLAB is installed.
- If you don't have MATLAB, then you won't be able to run the exact DBN inference method of Hill et al., 2012.
- You'll need to comment out the hill method wherever it appears in analysis_config.yaml.

After completing this additional setup, we are ready to run the analyses.

Make any necessary modifications to the configuration file: analysis_config.yaml. This file controls the space of hyperparameters and datasets explored in the analyses.
Run the analyses using snakemake:
- If you're running the analyses on your local host, simply move to the directory containing Snakefile and call snakemake.
```
(ssps) $ cd ssps
(ssps) $ snakemake
```
- Since Julia is a dynamically compiled language, some time will be devoted to compilation when you run SSPS for the first time. You may see some warnings in stdout -- this is normal.
- If you're running the analyses on a cluster, call snakemake with the same Snakemake profile you found here:
```
(ssps) $ cd ssps
(ssps) $ snakemake --profile YOUR_PROFILE_NAME
```
(You will probably need to edit the job submission parameters in the profile's config.yaml file.)
Relax. It will take tens of thousands of cpu-hours to run all of the analyses.

Running SSPS on your data

Follow these steps to run SSPS on your dataset. You will need

a CSV file (tab separated) containing your time series data
a CSV file (comma separated) containing your prior edge confidences.
Optional: a JSON file containing a list of variable names (i.e., node names).

Install the python dependencies if you haven't already. Find detailed instructions above.
cd to the run_ssps directory
Configure the parameters in ssps_config.yaml as appropriate
Run Snakemake: $ snakemake --cores 1. Increase 1 to increase the maximum number of CPU cores to be used.

A note about parallelism

SSPS allows two levels of parallelism: (1) at the Markov chain level and (2) at the iteration level.

Chain-level parallelism is provided via Snakemake. For example, Snakemake can run 4 chains simultaneously if you specify --cores 4 at the command line: $ snakemake --cores 4. In essence, this just creates 4 instances of SSPS that run simultaneously.
Iteration-level parallelism is provided by Julia's multi-threading features. The number of threads available to a SSPS instance is specified by an environment variable: JULIA_NUM_THREADS.
The total number of CPUs used by your SSPS jobs is the product of Snakemake's --cores parameter and Julia's JULIA_NUM_THREADS environment variable. Concretely: if we run snakemake --cores 2 and have JULIA_NUM_THREADS=4, then up to 8 CPUs may be used at one time by the SSPS jobs.

Licenses

The dream-challenge data is described in Hill et al., 2016 and is originally from Synapse.

Name		Name	Last commit message	Last commit date
Latest commit History 283 Commits
.github/workflows		.github/workflows
SSPS		SSPS
dream-challenge		dream-challenge
funchisq		funchisq
hill-method		hill-method
run_ssps		run_ssps
scripts		scripts
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE.txt		LICENSE.txt
README.md		README.md
Snakefile		Snakefile
analysis_config.yaml		analysis_config.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sparse Signaling Pathway Sampling

Installation and basic setup

Reproducing the analyses

Running SSPS on your data

A note about parallelism

Licenses

About

Releases 5

Packages

Contributors 2

Languages

License

gitter-lab/ssps

Folders and files

Latest commit

History

Repository files navigation

Sparse Signaling Pathway Sampling

Installation and basic setup

Reproducing the analyses

Running SSPS on your data

A note about parallelism

Licenses

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 5

Packages 0

Contributors 2

Languages

Packages