Skip to content

DL models to test added value of using generated complex data for affinity prediction

License

Notifications You must be signed in to change notification settings

volkamerlab/kinodata-3D-affinity-prediction

Repository files navigation

Kinodata-3D dataset and models

This repository contains a pyg-based interface to the Kinodata-3D dataset and the code used to train and evaluate the models presented in the Kinodata-3D publication

Installation

We currently only support installation from source.

(1) Clone this repo

(2) Set up Python environment

Use mamba (or conda) to set up a Python environment,

mamba env create -f environment.yml
mamba activate kinodata

and install this package in editable/develop mode

pip install -e .

(3) Obtain raw data

The raw data, docked poses and kinase pdb files, can be obtained from Zenodo. After downloading the archives, extract them in the root directory of this repository.

cd PATH_TO_REPO
unzip ...

See the Kinodata-3D repo for more information and the code used to generate the raw data.

General usage

Reproducing results

(1) Acquire exact dataset and data split versions

If you intend to reproduce our results, we strongly recommend that you use our preprocessed version of the dataset and corresponding data splits.

(2) Model training and evaluation

You can use the shell script condor/train_generic.sh to train and test a model in one run, on one particular split. Create a file wandb_api_key in the root directory of this repository and paste your wandb API key, if you want to sync results to Weight & Biases. Otherwise, run wandb disable in a terminal with the conda environment activated, before training.

The script requires the following positional arguments

  1. Base python script, one of "train_dti_baseline", "train_sparse_transformer"
  2. Split type, i.e. one of "scaffold-k-fold", "random-k-fold", "pocket-k-fold".
  3. Integer RMSD cutoff for the dataset, e.g. 2, 4, or 6 as used in the publication.
  4. A .yaml file that contains additional configuration parameters, e.g. model hyperparameters.
  5. The integer index of the cross-validation fold used for testing.

For instance,

./condor/train_generic.sh train_dti_baseline scaffold-k-fold 2 dti.yaml 0

trains and tests the DTI baseline on the scaffold-5-fold (default k is 5) split of the dataset containing all complexes with predicted RMSD <= 2 Angstroms. Folds 1-4 are used for training and fold 0 for testing.

About

DL models to test added value of using generated complex data for affinity prediction

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •