The scripts in this repository can be used to perform template docking on the kinodata selection of kinase activity data mined from chembl. They were used for the generation of the kinodata-3D dataset.
The template docking process is illustrated in this figure.
It consists of three main steps:
- Finding similar ligands for a given kinase-ligand pair for which the binding pose is known empirically (a).
- Performing template docking using that similar known complex as a basis (b).
- Filtering docked complexes according to their estimated docking quality (c).
Step 2 makes use of the kinoml framework.
Performing step 1 consists of running two scripts:
- Finding the empirical template and
- downloading the complex structure from KLIFS.
These steps are performed by the script
pipeline/klifs_template.py
. To call this script, download the latest kinase activities as curated by kinodata.
The template docking is done using pipeline/docking.py
. For running the docking and monitoring timeouts as well as memory usage, we make use of HTCondor. The corresponding job is defined in pipeline/docking.sub
.
The final filtering of compounds is in fact mainly the annotation of docked complexes using a simple predictive model.
This model takes analytical docking output, ie. the posit probability and the Chemgauss4 score, as well as the template similarity as inputs.
It is trained using recent re-docking benchmark data by Schaller et al.
The model and code can be found in notebooks/rmsd_prediction.ipynb
and notebooks/simple_nn_model.pth
, respectively.
The environment can be set up using mamba or conda. To do this run the following command
mamba env create -f env.yml