Paper: https://www.biorxiv.org/content/10.1101/2024.12.07.627340v1
Table of Contents
Drug repurposing presents a valuable strat- egy to expedite drug discovery by identifying new ther- apeutic uses for existing compounds, especially for dis- eases with limited treatment options. We propose a Gen- erative AI-assisted Virtual Screening Pipeline that com- bines generative modeling, binding pocket prediction, and similarity-based searches within drug databases to achieve a generalizable and efficient approach to drug repurposing. Our pipeline enables blind screening of any protein target without requiring prior structural or functional knowledge, allowing it to adapt to a wide range of diseases, including emerging health threats and novel targets where informa- tion is scarce. By rapidly generating potential ligands and efficiently identifying and ranking drug candidates, our ap- proach accelerates the drug discovery process, broadening the scope and impact of repurposing efforts and offering new possibilities for therapeutic development.
Overview of the Generative AI-assisted Drug Repurposing Pipeline. The pipeline consists of two phases: Phase 1
generates potential ligands using generative AI, and Phase 2 identifies promising drug candidates via similarity-based searches
within drug databases.
Files should be placed as the following folder structure:
root
├── assets
│ ├── hiv # Output of DrugPipe for the HIV scenario
│ │ │── generation
│ │ │ │── generation.csv
│ │ │── generation_docking
│ │ │ │──...
│ │ │──preprocessed_data
│ │ │ │──...
│ │ │── remove_water
│ │ │ │── 2jle.pdb
│ │ │ │── 2jle.pdbqt
│ │── covid19 # Output of DrugPipe for the COVID19 scenario
│ │ │── generation
│ │ │ │── generation.csv
│ │ │── ...
│ │── admet
│ │ │── covid_preds.csv
│ │ │── hiv_preds.csv
| |── 3d_structure
│ │ │── 2jle.pdb # protein structure
│ │ │── 2jle_pipeline.pdbqt # output ligand of DrugPipe
│ │ │── 2jle_qvinaw.pdbqt # qvina-w
│ │ │── 2jle_gt.pdb # gt ligand structure
│ │ │── ...
| |── q_vinaw
│ │ │── qvinaw_2jle.csv # name ligand, q_vina_score and docking time of protein 2jle
│ │ │── ...
| |── drug_similarity
| | |── similarity_ranks_gnns.txt # similarity of other drugs vs real drugs
| | |── ....
├── datasets
│ ├── drugbank.csv
│ ├── drugbank_conformation
│ │ ├── DB00114.sdf
│ │ ├── DB00116.sdf
│ │ ├── ...
├── search_dgi
├── diffusion_generate
├── e3gnn_utils.py
├── e3gnn.py
├── equiformer.py
├── gat.py
├── pipeline_gnn.py
├── utils.py
├── README.md
Please install the environments by the following command:
conda env create --name pipeline --file=pipeline.yml
-
Conformer generation:
cd datasets/ python generate_conformation.py
-
Run pipeline with gnns searching methods dataset:
python pipeline_gnn.py 0
-
Run gat searching methods dataset:
python gat.py
-
Run equiformer searching methods dataset:
python equiformer.py
-
Run e3gnn searching methods dataset:
python e3gnn.py
-
ADMET properties prediction:
bash admet.sh
-
Real drugs searching:
python similarity_drugs.py
-
Qvina_w docking:
python qvina_w.py
-
Molecular properties:
python drug_properties.py
The results are organized in the assets
folder with the following subdirectories:
- HIV, COVID-19: Contains the outputs generated by DrugPipe for the HIV and COVID-19.
- admet: Includes ADMET property data for compounds related to the HIV and COVID-19 scenarios.
- 3d_structure: Provides 3D molecular structures used for visualization in the case studies.
- q_vinaw: Stores the results produced by the
q_vinaw
method for the case studies. - drug_similarity: Contains drug similarity comparisons between real drugs and others, based on various search algorithms.
@article {Pham2024.12.07.627340,
author = {Pham, Phuc and Nguyen, Viet Thanh Duy and Cho, Kyu Hong and Hy, Truong Son},
title = {Generative AI-assisted Virtual Screening Pipeline for Generalizable and Efficient Drug Repurposing},
elocation-id = {2024.12.07.627340},
year = {2024},
doi = {10.1101/2024.12.07.627340},
publisher = {Cold Spring Harbor Laboratory},
abstract = {Drug repurposing presents a valuable strategy to expedite drug discovery by identifying new therapeutic uses for existing compounds, especially for diseases with limited treatment options. We propose a Generative AI-assisted Virtual Screening Pipeline that combines generative modeling, binding pocket prediction, and similarity-based searches within drug databases to achieve a generalizable and efficient approach to drug repurposing. Our pipeline enables blind screening of any protein target without requiring prior structural or functional knowledge, allowing it to adapt to a wide range of diseases, including emerging health threats and novel targets where information is scarce. By rapidly generating potential ligands and efficiently identifying and ranking drug candidates, our approach accelerates the drug discovery process, broadening the scope and impact of repurposing efforts and offering new possibilities for therapeutic development. Detailed results and implementation can be accessed at https://github.com/HySonLab/DrugPipeCompeting Interest StatementThe authors have declared no competing interest.},
URL = {https://www.biorxiv.org/content/early/2024/12/11/2024.12.07.627340},
eprint = {https://www.biorxiv.org/content/early/2024/12/11/2024.12.07.627340.full.pdf},
journal = {bioRxiv}
}
@inproceedings{
velickovic2018deep,
title="{Deep Graph Infomax}",
author={Petar Veli{\v{c}}kovi{\'{c}} and William Fedus and William L. Hamilton and Pietro Li{\`{o}} and Yoshua Bengio and R Devon Hjelm},
booktitle={International Conference on Learning Representations},
year={2019},
url={https://openreview.net/forum?id=rklz9iAcKQ},
}
@misc{Gordić2020PyTorchGAT,
author = {Gordić, Aleksa},
title = {pytorch-GAT},
year = {2020},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/gordicaleksa/pytorch-GAT}},
}
@inproceedings{
liao2023equiformer,
title={Equiformer: Equivariant Graph Attention Transformer for 3D Atomistic Graphs},
author={Yi-Lun Liao and Tess Smidt},
booktitle={International Conference on Learning Representations},
year={2023},
url={https://openreview.net/forum?id=KwmPfARgOTD}
}
@article{brandstetter2021geometric,
title={Geometric and Physical Quantities improve E(3) Equivariant Message Passing},
author={Johannes Brandstetter and Rob Hesselink and Elise van der Pol and Erik Bekkers and Max Welling},
year={2021},
eprint={2110.02905},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
@article{doi:10.1021/acs.jcim.1c00203,
author = {Eberhardt, Jerome and Santos-Martins, Diogo and Tillack, Andreas F. and Forli, Stefano},
title = {AutoDock Vina 1.2.0: New Docking Methods, Expanded Force Field, and Python Bindings},
journal = {Journal of Chemical Information and Modeling},
volume = {61},
number = {8},
pages = {3891-3898},
year = {2021},
doi = {10.1021/acs.jcim.1c00203},
note ={PMID: 34278794},
}
@article{swanson2024admet,
title={ADMET-AI: a machine learning ADMET platform for evaluation of large-scale chemical libraries},
author={Swanson, Kyle and Walther, Parker and Leitz, Jeremy and Mukherjee, Souhrid and Wu, Joseph C and Shivnaraine, Rabindra V and Zou, James},
journal={Bioinformatics},
volume={40},
number={7},
pages={btae416},
year={2024},
publisher={Oxford University Press}
}
@article{hassan2017protein,
title={Protein-ligand blind docking using QuickVina-W with inter-process spatio-temporal integration},
author={Hassan, Nafisa M and Alhossary, Amr A and Mu, Yuguang and Kwoh, Chee-Keong},
journal={Scientific reports},
volume={7},
number={1},
pages={15451},
year={2017},
publisher={Nature Publishing Group UK London}
}