Tautomerization plays a critical role in numerous chemical and biological processes, impacting factors such as molecular stability, reactivity, biological activity, and ADME-Tox properties. Many drug-like molecules exist in multiple tautomeric states in aqueous solutions and complicating drug discovery. Predicting these tautomeric ratios and identifying the predominant species rapidly and accurately is crucial for computational drug discovery. In this study, we introduce sPhysNet-Taut, a deep learning model fine-tuned with experimental data leveraging the Siamese network, built upon a pre-trained model. This model predicts tautomer ratios in aqueous solutions using MMFF94-optimized geometries directly. We demonstrate how to correct the limitation of the pre-trained model for predicting molecular internal energies and solvent effects by fine-tuning using experimental data. This work not only provides a useful deep learning model for predicting tautomer ratios, but also provides a protocol for modeling pairwise data. To facilitate user-friendliness, we developed a readily accessible tool to predict stable tautomeric states in aqueous solutions, enumerating all possible tautomeric states and ranking them using the sPhysNet-Taut model.
- Python 3.10.13
- numpy 1.26.3
- RDKit 2024.03.3
- scipy
- pandas 2.1.4
- pytorch 2.2.0
- pytorch geometric 2.4.0
- torch-scatter 2.1.2
- torch-sparse 0.6.18
- torch-cluster 1.6.3
- torch-spline-conv 1.2.2
- treelib 1.6.1
- mols2grid 2.0.0
usage: predict_tautomer.py [-h] [--smi SMI]
[--low_energy_tautomer_cutoff LOW_ENERGY_TAUTOMER_CUTOFF]
[--num_confs NUM_CONFS]
[--ionization IONIZATION] [--ph PH] [--tph TPH]
[--output OUTPUT]
To calculate low-energy tautomeric states for small molecules by a deep learning model.
options:
-h, --help show this help message and exit
--smi SMI the molecular smiles
--low_energy_tautomer_cutoff LOW_ENERGY_TAUTOMER_CUTOFF
the energy cutoff for low energy
--num_confs NUM_CONFS
the number of conformation for solvation energy
prediction
--ionization IONIZATION
determine to generate ionization states by predicted pKa
using the given pH
--ph PH the target pH for protonation states generation
--tph TPH pH tolerance for protonation states generation
--output OUTPUT the output SDF file name
These is a example for the tautomeric states prediction:
python predict_tautomer.py --smi "Cc1c2c([nH]n1)OC(=C([C@@]2(c3cc(cc(c3)N4CCCC4)C(F)(F)F)C(C)C)C#N)N"
Alternatively, you can run it via Jupyter Notebook; the results will be displayed using Mols2Grid with energies. Additionally, we provide a free and user-friendly web server for the community, which you can access at https://yzhang.hpc.nyu.edu/tautomer.
If you use this project in your research, please cite the following papers:
Our manuscript is currently under review.
A limitation of our model is that its performance may decrease with increasing molecular size. Since the maximum number of heavy atoms in our experimental dataset is only 22, the model may not perform well with larger molecules.