forked from HannesStark/EquiBind
-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit baf8244
Showing
42 changed files
with
38,679 additions
and
0 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
*.ipynb linguist-vendored=false | ||
*.ipynb linguist-detectable=false | ||
|
||
/jupyter_notebooks linguist-vendored=false | ||
|
||
jupyter_notebooks/** linguist-vendored | ||
|
||
jupyter_notebooks/** linguist-vendored=false | ||
|
||
|
||
jupyter_notebooks/* linguist-vendored | ||
jupyter_notebooks/* linguist-vendored=false |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,145 @@ | ||
renew.sh | ||
tmux_renew.sh | ||
|
||
# Byte-compiled / optimized / DLL files | ||
__pycache__/ | ||
*.py[cod] | ||
*$py.class | ||
|
||
# C extensions | ||
*.so | ||
|
||
# Distribution / packaging | ||
.Python | ||
build/ | ||
develop-eggs/ | ||
dist/ | ||
downloads/ | ||
eggs/ | ||
.eggs/ | ||
lib/ | ||
lib64/ | ||
parts/ | ||
sdist/ | ||
var/ | ||
wheels/ | ||
*.egg-info/ | ||
.installed.cfg | ||
*.egg | ||
MANIFEST | ||
|
||
# PyInstaller | ||
# Usually these files are written by a python script from a template | ||
# before PyInstaller builds the exe, so as to inject date/other infos into it. | ||
*.manifest | ||
*.spec | ||
|
||
# Installer logs | ||
pip-log.txt | ||
pip-delete-this-directory.txt | ||
|
||
# Unit test / coverage reports | ||
htmlcov/ | ||
.tox/ | ||
.coverage | ||
.coverage.* | ||
.cache | ||
nosetests.xml | ||
coverage.xml | ||
*.cover | ||
.hypothesis/ | ||
.pytest_cache/ | ||
|
||
# Translations | ||
*.mo | ||
*.pot | ||
|
||
# Django stuff: | ||
*.log | ||
local_settings.py | ||
db.sqlite3 | ||
|
||
# Flask stuff: | ||
instance/ | ||
.webassets-cache | ||
|
||
# Scrapy stuff: | ||
.scrapy | ||
|
||
# Sphinx documentation | ||
docs/_build/ | ||
|
||
# PyBuilder | ||
target/ | ||
|
||
# Jupyter Notebook | ||
.ipynb_checkpoints | ||
|
||
# pyenv | ||
.python-version | ||
|
||
# celery beat schedule file | ||
celerybeat-schedule | ||
|
||
# SageMath parsed files | ||
*.sage.py | ||
|
||
# Environments | ||
.env | ||
.venv | ||
env/ | ||
venv/ | ||
ENV/ | ||
env.bak/ | ||
venv.bak/ | ||
|
||
# Spyder project settings | ||
.spyderproject | ||
.spyproject | ||
|
||
# Rope project settings | ||
.ropeproject | ||
|
||
# mkdocs documentation | ||
/site | ||
|
||
# mypy | ||
.mypy_cache/ | ||
|
||
.vscode/ | ||
|
||
|
||
*.zip | ||
|
||
.idea/ | ||
|
||
|
||
#################### Project specific | ||
|
||
# this ignores everything in data except for the file | ||
!/data | ||
/data/* | ||
!/data/PDBBind_deepBSP_filtered/pdbbind_ids_without_overlap_with_casf.data | ||
!/data/timesplit_test | ||
!/data/timesplit_no_lig_overlap_train | ||
!/data/timesplit_no_lig_overlap_val | ||
!/data/timesplit_no_lig_or_rec_overlap_train | ||
!/data/timesplit_no_lig_or_rec_overlap_val | ||
|
||
|
||
cache | ||
|
||
logs | ||
|
||
# temporary files | ||
temp/ | ||
bsub* | ||
stderr* | ||
stdout* | ||
|
||
runs2 | ||
# this excludes everything in the runs directory except for that specific run | ||
!/runs | ||
/runs/* | ||
!/runs/rigid_redocking | ||
!/runs/flexible_self_docking |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,123 @@ | ||
|
||
# EquiBind: Geometric Deep Learning for Drug Binding Structure Prediction | ||
|
||
### [Paper on arXiv](https://arxiv.org/abs/2202.05146) | ||
|
||
EquiBind, is a | ||
SE(3)-equivariant geometric deep learning model | ||
performing direct-shot prediction of both i) the receptor binding location (blind docking) and ii) the | ||
ligand’s bound pose and orientation. EquiBind | ||
achieves significant speed-ups and better quality | ||
compared to traditional and recent baselines. | ||
If you have questions, don't hesitate to open an issue or ask me | ||
via [[email protected]]([email protected]) | ||
or [social media](https://hannes-stark.com/). I am happy to hear from you! | ||
|
||
![](.fig_intro.jpg) | ||
|
||
![](.model2.jpg) | ||
|
||
# Dataset | ||
|
||
Our preprocessed data (see dataset section in the paper Appendix) is available from [zenodo](https://zenodo.org/record/6034088). \ | ||
The files in `data` contain the names for the time-based data split. For the no-ligand overlap split described in the main paper, these are 1) train: `old_no_newL_train` 2) train: `old_no_newL_val` 3) test: `new_names` | ||
|
||
If you want to train one of our models with the data then: | ||
1. download it from [zenodo](https://zenodo.org/record/6034088) | ||
2. unzip the directory and place it into `data` such that you have the path `data/PDBBind` | ||
|
||
|
||
# Use provided model weights to predict binding structure of your own protein-ligand pairs: | ||
|
||
## Step 1: What you need as input | ||
|
||
Ligand files of the formats ``.mol2`` or ``.sdf`` or ``.pdbqt`` or ``.pdb``. \ | ||
Receptor files of the format ``.pdb`` \ | ||
For each complex you want to predict you need a directory containing the ligand and receptor file. Like this: | ||
``` | ||
my_data_folder | ||
└───name1 | ||
│ name1_protein.pdb | ||
│ name1_ligand.sdf | ||
└───name2 | ||
│ name2_protein.pdb | ||
│ name2_ligand.sdf | ||
... | ||
``` | ||
|
||
## Step 2: Setup Environment | ||
|
||
We will set up the environment using [Anaconda](https://docs.anaconda.com/anaconda/install/index.html). Clone the | ||
current repo | ||
|
||
git clone https://github.com/HannesStark/EquiBind | ||
|
||
Create a new environment with all required packages using `environment.yml` (this can take a while). While in the project directory run: | ||
|
||
conda env create | ||
|
||
Activate the environment | ||
|
||
conda activate equibind | ||
|
||
Here are the requirements themselves if you want to install them manually instead of using the `environment.yml`: | ||
```` | ||
python=3.7 | ||
pytorch 1.10 | ||
torchvision | ||
cudatoolkit=10.2 | ||
torchaudio | ||
dgl-cuda10.2 | ||
rdkit | ||
openbabel | ||
biopython | ||
rdkit | ||
biopandas | ||
pot | ||
dgllife | ||
joblib | ||
pyaml | ||
icecream | ||
matplotlib | ||
tensorboard | ||
```` | ||
|
||
## Step 3: Predict Binding Structures! | ||
|
||
In the config file `configs_clean/inference.yml` set the path to your input data folder `inference_path: path_to/my_data_folder`. | ||
Then run: | ||
|
||
python inference.py --config=configs_clean/inference.yml | ||
|
||
Done! :tada: \ | ||
Your results are saved as `.sdf` files in the directory specified | ||
in the config file under ``output_directory: 'data/results/output'`` and as tensors at ``runs/flexible_self_docking/predictions_RDKitFalse.pt``! | ||
|
||
# Reproducing paper numbers | ||
Download the data and place it as described in the "Dataset" section above. | ||
### Using the provided model weights | ||
To predict binding structures using the provided model weights run: | ||
|
||
python inference.py --config=configs_clean/inference_file_for_reproduce.yml | ||
|
||
This will give you the results of *EquiBind-U* and then those of *EquiBind* after running the fast ligand point cloud fitting corrections. \ | ||
The numbers are a bit better than what is reported in the paper. We will put the improved numbers into the next update of the paper. | ||
### Training a model yourself and using those weights | ||
To train the model yourself, run: | ||
|
||
python train.py --config=configs_clean/RDKitCoords_flexible_self_docking.yml | ||
|
||
The model weights are saved in the `runs` directory.\ | ||
You can also start a tensorboard server ``tensorboard --logdir=runs`` and watch the model train. \ | ||
To evaluate the model on the test set, change the ``run_dirs:`` entry of the config file `inference_file_for_reproduce.yml` to point to the directory produced in `runs`. | ||
Then you can run``python inference.py --config=configs_clean/inference_file_for_reproduce.yml`` as above! | ||
## Reference | ||
|
||
:page_with_curl: Paper [on arXiv](https://arxiv.org/abs/2202.05146) | ||
``` | ||
@misc{stark2022equibind, | ||
title={EquiBind: Geometric Deep Learning for Drug Binding Structure Prediction}, | ||
author={Hannes Stärk and Octavian-Eugen Ganea and Lagnajit Pattanaik and Regina Barzilay and Tommi Jaakkola}, | ||
year={2022} | ||
} | ||
``` |
Oops, something went wrong.