Skip to content

idrblab/LEDAP

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Large Language Model-Based Natural Language Encoding Could Be All You Need for Drug Biomedical Association Prediction

Hanyu Zhang, Yuan Zhou, Zhichao Zhang, Huaicheng Sun, Ziqi Pan, Minjie Mou, Wei Zhang, Qing Ye, Tingjun Hou, Honglin Li * , Chang-Yu Hsieh * and Feng Zhu *

Graphical Abstract

image

Dependencies

  • LEDAP should be deployed on Linux in python 3.8.
  • Main requirements: python==3.8.8, pytorch==1.10.1, xgboost 2.0.3, scikit-learn==0.24.1, optuna 2.10.0.
  • requirements.txt is provided for environment dependency installation by pip install -r requirements.txt.
  • To use GPU, please install the GPU version of pytorch.

Install

  1. Download source codes of LEDAP.
  2. LEDAP should be deployed on Linux.
  3. The LEDAP tree includes directories as follows:
 |- DDA
    |- bashes
    |- data
    |- rf
 |- DDI
    |- bashes
    |- data
    |- xbgoost
 |- DSA
    |- bashes
    |- data
    |- xbgoost
 |- paper
    |- materials
 |- representations
    |- llama_2-7b
 |- README.md
 |- requirements.txt
 |- LICENSE

Usage

1. Prepare feature representation for bio-entities using Large Language Models (Llama 2 in this study)

1.1 Collect textual descriptions according to the respective requirements
1.2 Conduct bio-text preprocessing and feature transformation following the Llama 2 Release
1.3 Place the representation data into the ./representaions/llama_2-7b/ imitating the examples.

Note: the prepared LLM-based representations used in this study were available on Google Drive. ^ The associated account has been unexpectedly deactivated by Google, we are now working to fix the issue, PLEASE WAITING…. ^ MEGA.

2. Use LLM-based representations to analyze drug biomedical associations

2.1 Switch to the target path the user wants to investigate (cd ./DDA for drug-disease association, cd ./DDI for drug-drug interaction, cd ./DSA for drug-side effect association). Or construct a new path for additional research imitating the examples.
2.2 Place the predicting data that users want to investigate into the ./data imitating the examples.
2.3 Switch to the directory of ./bashes and modify the bash files according to the recorded guidance, then execute the following commands :
sh run.sh	# Run the model for DBA prediction

Citation and Disclaimer

The manuscript is published by Analytical Chemistry.

Please cite: Zhang H, Zhou Y, Zhang Z, Sun H, Pan Z, Mou M, Zhang W, Ye Q, Hou T, Li H, Hsieh CY, Zhu F. Large Language Model-Based Natural Language Encoding Could Be All You Need for Drug Biomedical Association Prediction. Anal. Chem. 96(30), 12395–12403

Should you have any questions, please contact Dr. Zhang at [email protected]

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.0%
  • Shell 2.0%