Positive unlabeled learning-based enzyme promiscuity prediction

This repository contains the positive unlabeled learning-based enzyme promiscuity prediction (PU-EPP) model as described in the paper Deep Learning Enables Rapid Identification of Mycotoxin-degrading Enzymes.

Requirements

A Linux workstation with GPUs is essential for deploying PU-EPP. The final PU-EPP model was trained on 5 NVIDIA Tesla V100 GPUs, which took about 2 weeks.

Installation

Dependency

The code has been tested in the following environment:

Package	Version
Python	3.9.12
PyTorch	1.12.0
CUDA	11.6.1
RDKit	2022.3.5
Gensim	4.1.2
Scikit-learn	1.1.3

Install dependencies

Install anaconda first, then install the conda environment by:

conda env create -f PU_EPP_environment.yml
conda activate PU_EPP
pip install --upgrade pip
pip install jupyter
jupyter notebook

Training

Run train.ipynb and specify --class CFG to your own config.

Testing

Run test.ipynb and specify --class CFG to your own config.

Predicting

Screening functional enzymes for a substrate from a .faste file

To load the PU-EPP model and make predictions from a .faste file, put the example1.fasta (.fasta file of candidate enzymes) in the data folder,
Run predict.ipynb and specify:

--PreCFG.useFasteFile = True
--PreCFG.fasteFile Path to the .faste file of candidate enzymes
--PreCFG.compound The molecular structure of the substrate in simplified molecular input line entry system (SMILES) format

After a few minutes of calculation, you will find the result in the result folder with the name example1_result.csv.

Predicting the probes of enzyme-substrate pairs

To load PU-EPP and make predictions for enzyme-substrate pairs, put the example2.csv file (data of enzyme-substrate pairs) in the data folder.

example2.csv

Substrate Enzyme

SMILES1 SEQ1

SMILES2 SEQ2
Run predict.ipynb and specify:

--PreCFG.useFasteFile = False
--PreCFG.csvFile Path to the .csv file of enzyme-substrate pairs

You will find the result in the result folder with the name example2_result.csv.

Fine-tuning

To fine-tune PU-EPP on a new dataset:

put the example3_train.csv file (data of enzyme-substrate pairs) and example3_test.csv file in the data folder.

example3_test.train or example3_test.csv

Substrate Enzyme Label

SMILES1 SEQ1 1

SMILES2 SEQ2 0

Label 1 stands for positive, and 0 stands for negative or unlabeled.
Run finetuning.ipynb and specify:

--CFG.traindata_path Path to the .csv file of the training set
--CFG.testdata_path Path to the .csv file of the test set
--CFG.modelsave_file_suffix The suffix of the model name to save
--CFG.result_file_suffix The suffix of the log file name to save

You will find the fine-tuned model named with --CFG.modelsave_file_suffix as a suffix in the model/model_funetuning folder.

Assistance

For researchers who do not have the hardware to deploy PU-EPP, please send your data in one of the following formats to us ([email protected] or [email protected]). We will then do the calculation and get the results back to you.

a substrate and a list of candidate enzymes
an enzyme and a list of candidate substrates
a list of enzyme-substrate pairs

Link to other repositories

Zenodo, https://doi.org/10.5281/zenodo.7813738

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
EPP-MB		EPP-MB
data		data
figures		figures
model		model
results		results
utils		utils
LICENSE		LICENSE
PU_EPP_environment.yml		PU_EPP_environment.yml
attention_visualization.ipynb		attention_visualization.ipynb
finetuning.ipynb		finetuning.ipynb
predict.ipynb		predict.ipynb
readme.md		readme.md
test.ipynb		test.ipynb
train.ipynb		train.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Positive unlabeled learning-based enzyme promiscuity prediction

Requirements

Installation

Dependency

Install dependencies

Training

Testing

Predicting

Screening functional enzymes for a substrate from a .faste file

Predicting the probes of enzyme-substrate pairs

Fine-tuning

Assistance

Link to other repositories

About

Releases

Packages

Languages

Substrate	Enzyme
SMILES1	SEQ1
SMILES2	SEQ2

Substrate	Enzyme	Label
SMILES1	SEQ1	1
SMILES2	SEQ2	0

License

chriscui0823/PU-EPP

Folders and files

Latest commit

History

Repository files navigation

Positive unlabeled learning-based enzyme promiscuity prediction

Requirements

Installation

Dependency

Install dependencies

Training

Testing

Predicting

Screening functional enzymes for a substrate from a .faste file

Predicting the probes of enzyme-substrate pairs

Fine-tuning

Assistance

Link to other repositories

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages