GitHub - volkamerlab/kinodata-3D-affinity-prediction: DL models to test added value of using generated complex data for affinity prediction

Kinodata-3D dataset and models

This repository contains a pyg-based interface to the Kinodata-3D dataset and the code used to train and evaluate the models presented in the Kinodata-3D publication

Installation

We currently only support installation from source.

(1) Clone this repo

(2) Set up Python environment

Use mamba (or conda) to set up a Python environment,

mamba env create -f environment.yml
mamba activate kinodata

and install this package in editable/develop mode

pip install -e .

(3) Obtain raw data

The raw data, docked poses and kinase pdb files, can be obtained from Zenodo. After downloading the archives, extract them in the root directory of this repository.

cd PATH_TO_REPO
unzip ...

See the Kinodata-3D repo for more information and the code used to generate the raw data.

General usage

Reproducing results

(1) Acquire exact dataset and data split versions

If you intend to reproduce our results, we strongly recommend that you use our preprocessed version of the dataset and corresponding data splits.

(2) Model training and evaluation

You can use the shell script condor/train_generic.sh to train and test a model in one run, on one particular split. Create a file wandb_api_key in the root directory of this repository and paste your wandb API key, if you want to sync results to Weight & Biases. Otherwise, run wandb disable in a terminal with the conda environment activated, before training.

The script requires the following positional arguments

Base python script, one of "train_dti_baseline", "train_sparse_transformer"
Split type, i.e. one of "scaffold-k-fold", "random-k-fold", "pocket-k-fold".
Integer RMSD cutoff for the dataset, e.g. 2, 4, or 6 as used in the publication.
A .yaml file that contains additional configuration parameters, e.g. model hyperparameters.
The integer index of the cross-validation fold used for testing.

For instance,

./condor/train_generic.sh train_dti_baseline scaffold-k-fold 2 dti.yaml 0

trains and tests the DTI baseline on the scaffold-5-fold (default k is 5) split of the dataset containing all complexes with predicted RMSD <= 2 Angstroms. Folds 1-4 are used for training and fold 0 for testing.

Name		Name	Last commit message	Last commit date
Latest commit History 257 Commits
_static		_static
condor		condor
docker		docker
examples		examples
kinodata		kinodata
scripts		scripts
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cgnn.yaml		cgnn.yaml
cgnn_3d.yaml		cgnn_3d.yaml
dev_config.yaml		dev_config.yaml
dti.yaml		dti.yaml
environment.yml		environment.yml
ligand_gin.yaml		ligand_gin.yaml
mypy.ini		mypy.ini
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kinodata-3D dataset and models

Installation

(1) Clone this repo

(2) Set up Python environment

(3) Obtain raw data

General usage

Reproducing results

(1) Acquire exact dataset and data split versions

(2) Model training and evaluation

About

Releases 1

Packages

Contributors 4

Languages

License

volkamerlab/kinodata-3D-affinity-prediction

Folders and files

Latest commit

History

Repository files navigation

Kinodata-3D dataset and models

Installation

(1) Clone this repo

(2) Set up Python environment

(3) Obtain raw data

General usage

Reproducing results

(1) Acquire exact dataset and data split versions

(2) Model training and evaluation

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 4

Languages

Packages