Skip to content

Commit

Permalink
commit message
Browse files Browse the repository at this point in the history
  • Loading branch information
HannesStark committed Feb 11, 2022
0 parents commit baf8244
Show file tree
Hide file tree
Showing 42 changed files with 38,679 additions and 0 deletions.
Binary file added .fig_intro.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
12 changes: 12 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
*.ipynb linguist-vendored=false
*.ipynb linguist-detectable=false

/jupyter_notebooks linguist-vendored=false

jupyter_notebooks/** linguist-vendored

jupyter_notebooks/** linguist-vendored=false


jupyter_notebooks/* linguist-vendored
jupyter_notebooks/* linguist-vendored=false
145 changes: 145 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
renew.sh
tmux_renew.sh

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# pyenv
.python-version

# celery beat schedule file
celerybeat-schedule

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/

.vscode/


*.zip

.idea/


#################### Project specific

# this ignores everything in data except for the file
!/data
/data/*
!/data/PDBBind_deepBSP_filtered/pdbbind_ids_without_overlap_with_casf.data
!/data/timesplit_test
!/data/timesplit_no_lig_overlap_train
!/data/timesplit_no_lig_overlap_val
!/data/timesplit_no_lig_or_rec_overlap_train
!/data/timesplit_no_lig_or_rec_overlap_val


cache

logs

# temporary files
temp/
bsub*
stderr*
stdout*

runs2
# this excludes everything in the runs directory except for that specific run
!/runs
/runs/*
!/runs/rigid_redocking
!/runs/flexible_self_docking
Binary file added .model2.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
123 changes: 123 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@

# EquiBind: Geometric Deep Learning for Drug Binding Structure Prediction

### [Paper on arXiv](https://arxiv.org/abs/2202.05146)

EquiBind, is a
SE(3)-equivariant geometric deep learning model
performing direct-shot prediction of both i) the receptor binding location (blind docking) and ii) the
ligand’s bound pose and orientation. EquiBind
achieves significant speed-ups and better quality
compared to traditional and recent baselines.
If you have questions, don't hesitate to open an issue or ask me
via [[email protected]]([email protected])
or [social media](https://hannes-stark.com/). I am happy to hear from you!

![](.fig_intro.jpg)

![](.model2.jpg)

# Dataset

Our preprocessed data (see dataset section in the paper Appendix) is available from [zenodo](https://zenodo.org/record/6034088). \
The files in `data` contain the names for the time-based data split. For the no-ligand overlap split described in the main paper, these are 1) train: `old_no_newL_train` 2) train: `old_no_newL_val` 3) test: `new_names`

If you want to train one of our models with the data then:
1. download it from [zenodo](https://zenodo.org/record/6034088)
2. unzip the directory and place it into `data` such that you have the path `data/PDBBind`


# Use provided model weights to predict binding structure of your own protein-ligand pairs:

## Step 1: What you need as input

Ligand files of the formats ``.mol2`` or ``.sdf`` or ``.pdbqt`` or ``.pdb``. \
Receptor files of the format ``.pdb`` \
For each complex you want to predict you need a directory containing the ligand and receptor file. Like this:
```
my_data_folder
└───name1
│ name1_protein.pdb
│ name1_ligand.sdf
└───name2
│ name2_protein.pdb
│ name2_ligand.sdf
...
```

## Step 2: Setup Environment

We will set up the environment using [Anaconda](https://docs.anaconda.com/anaconda/install/index.html). Clone the
current repo

git clone https://github.com/HannesStark/EquiBind

Create a new environment with all required packages using `environment.yml` (this can take a while). While in the project directory run:

conda env create

Activate the environment

conda activate equibind

Here are the requirements themselves if you want to install them manually instead of using the `environment.yml`:
````
python=3.7
pytorch 1.10
torchvision
cudatoolkit=10.2
torchaudio
dgl-cuda10.2
rdkit
openbabel
biopython
rdkit
biopandas
pot
dgllife
joblib
pyaml
icecream
matplotlib
tensorboard
````

## Step 3: Predict Binding Structures!

In the config file `configs_clean/inference.yml` set the path to your input data folder `inference_path: path_to/my_data_folder`.
Then run:

python inference.py --config=configs_clean/inference.yml

Done! :tada: \
Your results are saved as `.sdf` files in the directory specified
in the config file under ``output_directory: 'data/results/output'`` and as tensors at ``runs/flexible_self_docking/predictions_RDKitFalse.pt``!

# Reproducing paper numbers
Download the data and place it as described in the "Dataset" section above.
### Using the provided model weights
To predict binding structures using the provided model weights run:

python inference.py --config=configs_clean/inference_file_for_reproduce.yml

This will give you the results of *EquiBind-U* and then those of *EquiBind* after running the fast ligand point cloud fitting corrections. \
The numbers are a bit better than what is reported in the paper. We will put the improved numbers into the next update of the paper.
### Training a model yourself and using those weights
To train the model yourself, run:

python train.py --config=configs_clean/RDKitCoords_flexible_self_docking.yml

The model weights are saved in the `runs` directory.\
You can also start a tensorboard server ``tensorboard --logdir=runs`` and watch the model train. \
To evaluate the model on the test set, change the ``run_dirs:`` entry of the config file `inference_file_for_reproduce.yml` to point to the directory produced in `runs`.
Then you can run``python inference.py --config=configs_clean/inference_file_for_reproduce.yml`` as above!
## Reference

:page_with_curl: Paper [on arXiv](https://arxiv.org/abs/2202.05146)
```
@misc{stark2022equibind,
title={EquiBind: Geometric Deep Learning for Drug Binding Structure Prediction},
author={Hannes Stärk and Octavian-Eugen Ganea and Lagnajit Pattanaik and Regina Barzilay and Tommi Jaakkola},
year={2022}
}
```
Loading

0 comments on commit baf8244

Please sign in to comment.