_ __
| | / _|
___ ___ | |__ ___ ___ ___ _ __ | |_
/ __|/ _ \| '_ \ / _ \ / __|/ _ \ | '_ \ | _|
| (__| __/| |_) || __/| (__| (_) || | | || |
\___|\___||_.__/ \___| \___|\___/ |_| |_||_|
cebeconf
package is a set of machine-learning models for predicting 1s-c
ore e
lectron b
inding e
nergies of CONF
atoms in organic molecules (Ref-1).
- Models were trained on 12880 small organic molecules from the bigQM7ω dataset (Ref-2).
- Target property (1s core-electron binding energies) was calculated using the meta-GGA-DFT method strongly constrained and appropriately normed (
SCAN
) with a large,Tight-full
numeric atom-centered orbital (NAO) basis set implemented in FHI-aims. - These calculations were performed using ωB97XD/def2TZVP geometries presented in the bigQM7ω dataset.
- For delta learning, the baseline energies were assigned based on Mulliken occupations. The data can be found in
Baseline_files
. - Two example files (UFF-PBE : ethane and propane) are also provided in home folder showing the output from Mulliken.out file from FHI-aims.
- To facilitate rapid application of the ML models, training was done using baseline geometries of the bigQM7ω molecules determined with the universal force field (UFF). These geometries are also provided at https://moldis-group.github.io/bigQM7w/
- So, for new predictions, the ML models require geometries quickly determined with UFF.
- ML models were trained using the kernel-ridge-regression model using the atomic Coulomb matrix representation.
- For technical details, see Ref-1, and its Suppoorting Information.
-
Install dependencies
numpy
,pandas
-
Download and install the package
git clone [email protected]:moldis-group/cebeconf.git
pip3 install -e cebeconf
- Install from PyPI
pip3 install cebeconf
-
Create an XYZ file at the UFF level (see below to learn about how to do this)
-
Run the ML model in
python3
(example incebeconf/test
folder)
from cebeconf import calc_be
calc_be('test.xyz','direct', 'ACM')
- Suppose `test.xyz' contains the following geometry (which is the last molecule in bigQM7ω dataset)
18
bigQM7w_UFF_012883
C 1.03070 -0.07680 0.06770
C 2.53800 -0.21440 -0.12550
C 2.99750 -0.46340 -1.49170
N 3.09380 0.90540 -0.90860
C 4.47940 1.20090 -0.50870
C 5.01760 2.53370 -1.00430
C 4.47490 2.41010 0.41050
H 0.59860 -1.07330 0.29480
H 0.52630 0.33730 -0.83250
H 0.83500 0.60170 0.92380
H 3.17550 -0.57150 0.71420
H 2.25180 -0.44020 -2.31440
H 3.99580 -0.93590 -1.63370
H 5.09800 0.43550 0.01500
H 4.34280 2.85880 -1.82600
H 6.09080 2.33310 -1.20820
H 3.60210 3.09770 0.43410
H 5.35240 2.60380 1.06330
- Running the code generates the following output
...
+--------------+
| User inputs: |
+--------------+
Reading coordinates from: test.xyz
Predicting 1s CEBEs using direct ML with the ACM descriptor
+--------------+
| Prediction: |
+--------------+
1 C 1.03070000 -0.07680000 0.06770000 290.81 eV
2 C 2.53800000 -0.21440000 -0.12550000 291.83 eV
3 C 2.99750000 -0.46340000 -1.49170000 291.90 eV
...
Write down the SMILES descriptor of the molecule (example c1ccccc1
for benzene) in a file.
echo 'c1ccccc1' > benzene.smi
Generate an initial geometry using openbabel. If you have obtained an initial geometry by other means, then you can skip the previous step.
obabel -oxyz benzene.smi > benzene.xyz --gen3d
Relax tightly using UFF.
obminimize -oxyz -sd -ff UFF -c 1e-8 benzene.xyz > benzene_UFF.xyz
[Ref-1] Chemical Space-Informed Machine Learning Models for
Rapid Predictions of X-ray Photoelectron Spectra of Organic Molecules
Susmita Tripathy, Surajit Das, Shweta Jindal, Raghunathan Ramakrishnan
Mach. Learn.: Sci. Technol. 5 (2024) 045023.
[Ref-2] The Resolution-vs.-Accuracy Dilemma in Machine Learning Modeling of Electronic Excitation Spectra
Prakriti Kayastha, Sabyasachi Chakraborty, Raghunathan Ramakrishnan
Digital Discov., 1 (2022) 689-702.