Machine-Learning-Based Interatomic Potentials for Catalysis: a Universal Catalytic Large Atomic Model
Our Pre-trained Models can be obtained in the following configurations saved in google driver:
Model | training strategy | Download | val force MAE(meV/A) on metal system | val energy MAE(meV/atom) on metal system |
---|---|---|---|---|
Gemnet-OC | finetuned based on GemNet-OC-S2EFS-OC20+OC22 5 epoch | best_checkpoint_GemnetOC.pt config |
34.5 | 4.05 |
equiformerV2 | finetuned based on eq2_121M_e4_f100_oc22_s2ef.pt 2 epoch | checkpoint_eqV2.pt config |
26.0 | 32.5 |
DPA2 | finetuned based on DPA2_medium_28_10M_beta4.pt 2000000 steps | model.ckpt-2000000.pt config |
154 | 484 |
This section is responsible for generating structures, including bulk and slab structures for VASP calculations, and generating initial adsorption structures. We preset a series of commonly used small adsorbates and place them on slab models with 1, 2, or 4 molecules.
This section manages VASP tasks and workflows, as well as collects data for dpdata
. It can perform high-throughput optimization and molecular dynamics (MD) jobs based on pre-generated structures. It can also check the convergence of SCF steps in optimization and MD, and perform high-throughput conversion to LMDB or NPY format for further training.
This section is responsible for model-accelerated structure optimization, transition state search, and catalytic reaction network construction. The optimization and transition state search are based on a local fine-tune method, which involves a Labeling, Fine-tuning, and Inference loop to accelerate the optimization and MD process. It can also automatically construct reaction networks to generate possible intermediates and transition state structures.
This section contains the pretrained CLAM model for the post workflow, including its training, fine-tuning, and checkpoint files.
This section contains some useful scripts to generate cluster structures and convert the format of files.
This section contains the initial structure files and POSCAR files for VASP optimization and MD calculations to generate datasets.
Ensure that your system has the following software installed:
- Python 3 (version > 3.10)
- ASE (version > 3.22)
- Pymatgen (version > 2023.3.23)
- DeePMD-kit (version > 3.0.0a1)
- dpdata (version > 0.2.18)
- fairchem
- VASP (version > 5.4.4)
- VASPKIT
- SLURM (for job scheduling)
- tqdm (for progress bars)
- Clone the repository:
git clone https://github.com/lalaheihaihei/catalyticLAM.git
- Enter the folder:
cd catalyticLAM
Navigate to the generation directory and run the appropriate script to generate the desired structures:
get-bulk.py
: Generates bulk structures.get-slab.py
: Generates slab structures.element_list.json
: Metal and alloy elements for bulk generation.material.json
: Information for database generation.molecule.json
: Molecular structures database.
cd generation
python get-bulk.py --api-key Your-Api-Key --bulktype metal --elementNumber 1 --task search --ificsd
python get-bulk.py --plot --api-key Your-Api-Key --min-lw 10.0 --task generate
python get-slab.py --plot --api-key Your-Api-Key --molecule-type CO --up-down UUD --element Au --type type1
python get-slab.py --plot --api-key Your-Api-Key --molecule-type all --up-down UUUUDDDD --element Pd --type type3
Detailed usages are seen in README.md.
Navigate to the vaspworkflow directory, edit the input
file as needed, and run flow.py
:
flow.py
: Main workflow script for managing VASP calculations and data processing.input
: Input parameter file.POSCAR
: Directory containing various POSCAR files and their corresponding VASP calculation results.structure_db
: Stores the structure database.utils
: Contains configuration files and scripts required for VASP calculations.
cd vaspworkflow
nohup python flow.py POSCAR opt &
python flow.py POSCAR optcheck
nohup python flow.py POSCAR md &
python flow.py POSCAR mdcheck
python flow.py POSCAR dpdata
python flow.py POSCAR plot
Detailed usages are seen in README.md.
Navigate to the postworkflow directory, prepare input files, and run the relevant scripts:
flowopt.py
: Workflow script for structure optimization.flowts.py
: Workflow script for transition state search.POSCAR
: Initial structure file.utils
: Contains configuration files and scripts for optimization and transition state search.
cd postworkflow
cd optdp or optoc
nohup python ./flowopt.py --num_iterations 3 --steps_per_iteration 200 --fixed_atoms 0 --iffinal true --fmax 0.1 &
cd tsdp or tsoc
nohup python ./flowts.py POSCARis POSCARfs ./frozen_model.pth OUTCARis OUTCARfs &
Detailed usages are seen in README.md.
Navigate to the postworkflow/RNET directory, prepare input files, and run the relevant scripts:
RNet.py
: Genarate reaction network diagram.MakeSlab.py
: Construct all possible structures for intermediats adsorption on metal surfaces.plot_all.py
: Plot the energy changes and energy differences MAE.
cd postworkflow/RNET
python RNet.py 1 2 --layout spring
python MakeSlab.py --element Pt --max-index 1
Detailed usages are seen in README.md.
Navigate to the train directory, edit the input files, and run the training or fine-tuning jobs. Details of CLAM are in README.md
dp --pt train input.json > out
dp --pt train --finetune model.ckpt.10000000.pt --model-branch <head> finetune.json > out (At present, the head name is only supported for oc22, qm and metal)
python main.py --mode train --config-yml finetune1.yml --print-every 1000 >> out
python main.py --mode train --config-yml finetune1.yml --checkpoint gnoc_oc22_oc20_all_s2ef.pt --print-every 1000 >> out
Detailed usages are seen in README.md.
More information please refer to Deepmd-kit official website and fairchem official website.
Navigate to the scripts directory and run the appropriate script to generate the cluster structures or convert file formats.
cif2pos.py
: Convert CIF file to POSCAR.get-cluster.py
: Generate the structures of metal clusters in xyz format.json2cif.py
: Convert JSON file to CIF file.xyz2pos.py
: Convert XYZ file to POSCAR.sim_model.py
: For deleting the unnecessary keys in checkpoint files (oc22).cal_nframes.py
: Calculate the number of frames in a dataset with dp (deepmd-kit) format.make_test.py
: Make a dataset test with lmdb format.
More details are seen in README.md.
Navigate to the structure_db directory, you can find compressed files, which containing the initial structures.
2D.tgz
: The total 6351 POSCAR files of 2D materials for VASP calculation.2D-raw.tgz
: The initial json file containing the information of 2D materials and the corresponding cif files.bulk.tgz
: The POSCAR files of metals and alloys for VASP calculation.cluster.tgz
: The POSCAR files of clusters for VASP calculation.cluster-raw.tgz
: The initial xyz files of clusters.molecule.tgz
: The total POSCAR files of molecules for VASP calculation.molecule-raw.tgz
: The initial xyz files of molecules.slab.tgz
: The POSCAR files of slabs for VASP calculation.
This project is licensed under the GNU General Public License v3.0. See the LICENSE file for details.
- ASE: https://wiki.fysik.dtu.dk/ase/
- VASP: https://www.vasp.at/
- DeePMD-kit: https://github.com/deepmodeling/deepmd-kit
- dpdata: https://github.com/deepmodeling/dpdata
- fairchem: https://fair-chem.github.io/index.html
- VASPKIT: https://vaspkit.com/
- Pymatgen: https://pymatgen.org/
- SLURM: https://slurm.schedmd.com/
Please cite the works below if this repository is helpful.
Wu Z, Zhou L, Hou P, Liu Y, Guo T, Liu J-C. Catalytic Large Atomic Model (CLAM): A Machine-Learning-Based Interatomic Potential Universal Model. ChemRxiv. 2024; doi:10.26434/chemrxiv-2024-2xzct