(same code structure as github. If you wish to modify code/change input just copy the capsule into a new capsule that you own on Code Ocean!)
(should take 5-10 minutes with proper system setup)
git clone https://github.com/timkartar/DeepPBS
We recommend installation via conda
packagement tool.
If you do not have conda
please refer conda installation instructions Here
// gcc and cuda configs: gcc/12.3.0 cuda/12.2.1 (works with 12.2 and 12.1, just FYI)
conda create -n deeppbs_install python=3.10
conda init bash
conda activate deeppbs_install
// look here for other versions: https://pytorch.org/get-started/previous-versions/
conda install pytorch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install torch_geomtric
// look here for other versions: https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html
pip install torch_scatter torch_sparse torch_cluster -f https://data.pyg.org/whl/torch-2.3.0+cu121.html
pip install -U --no-cache-dir biopython==1.83 logomaker matplotlib==3.5.2 networkx pandas==1.4.4 pdb2pqr scipy==1.14.1 seaborn==0.13.2 freesasa==2.2.1
cd DeepPBS
pip install -e .
The preprocessing scripts depend on 3DNA and Curves, we have provided the packages required in dependencies/bin
and how to source them in run/process/proc_source.sh
.
However, please refer to x3dna-v2.3-linux-64bit/x3dna-v2.3/license.txt
for fair usage of this version of 3DNA software.
Note: The installation is tested on linux systems with cuda11.3 and cuda11.6, you may have to adjust Pyorch version number based on your system.
UPDATE (Feb 29, 2024): The latest version on github is tested on CUDA 12.2, PyTorch 2.3 and PyG 2.5. The .yml
file has been updated accordingly.
The project was developed on PyG2.0.1, although future versions of PyG are backwards compatible as of now, but we cannot guarantee stability on all versions. For more information refer installation pages for PyTorch and PyG
Example pipeline for processing and predicting is as below:
cd run/process/
- Put your PDB files containing biological aseemblies of interest into
pdb
directory - run
ls pdb > input.txt
./process_and_predict.sh
(you can parallelize the steps in this script through multiple job submissions)
This will process the list of pdbs and put the processed npz files into npz
directory.
Note: As evident, you can parallelize this script, but in that case make sure you create a separate working directory for each job. Otherwise temporary files generated during processing may conflict.
Then it will make predictions using the DeepPBS ensemble and put the predictions in output
directory (in run/process
)
Combined pre-processing and inference time for one biological assembly is in the order of seconds (e.g., for PDB ID 5x6g, about 15-20 seconds)
cd run/process
./vis_interpret.sh <pdb_name_without .pdb>
, for example./vis_interpret.sh 5x6g
This will compute and store the perturbation outcomes and other required information in run/plot_scripts/interpret_output
- You need a PyMol executable for this step! Once installed, you can run the following
pymol
(opens pymol GUI)pip install matplotlib
(in the pymol GUI command prompt)- close the pymol GUI
pymol ../plot_scripts/vis_interpret.py ../plot_scripts/ 5x6g.pdb
(run from terminal)
This will open a pymol session for the visualization (screenshot below) and save a .psw file in run/plot_scripts/interpret_output
Simulation trajectories in PDB format snapshots can be processed in similar manner:
Figshare link: https://doi.org/10.6084/m9.figshare.25678053
Download and place the data avilability number 2 somewhere on your system and configure the path in
/run/config.json
("data_dir"
). Also configure the "output_path"
as you wish.
run ./submit_cross.sh
. This will submit 5 cross-validation models to train simultaneaously.
Modify this script according to your need.