This project provides a script to manage VASP calculations for optimizing atomic structures (OPT), running ab initio molecular dynamics (AIMD), and processing results into datasets for machine learning purposes. The script supports submitting VASP jobs, checking SCF convergence, running AIMD simulations, and converting data for DeePMD or Graph Neural Network (GNN) training. Additionally, it includes functionality for visualizing the occurrence of elements in the dataset.
- Prerequisites
- File Structure
- Usage
- Workflow Description
- Detailed Instructions
- Notes
- License
- Acknowledgements
-
Python 3 (version > 3.10)
-
ASE (version > 3.22)
-
DeePMD-kit (version > 3.0.0a1)
-
dpdata (version > 0.2.18)
-
VASP and VASPKIT
-
SLURM (for job scheduling)
-
tqdm (version > 4.65.0,for progress bars)
if output lmdb dataset need:
-
torch (version > 2.3.1)
-
torch_geometric
-
torch_scatter
-
lmdb (version > 1.5.1)
-
fairchem
├── flow.py # Main script for managing VASP calculations
├── input # File containing input parameters
├── utils
│ ├── plot_weight.py # Script for plotting element weights
│ ├── INCAR_opt # INCAR file for structural optimization
│ ├── INCAR_md # INCAR file for AIMD calculations
│ └── sub.vasp # VASP submission script
└── structure_db # Directory contains POSCAR files
prefix
: Prefix for files to process. Usually POSCARoperation
: Specify the operation to perform. Choices are:opt
: Submit VASP optimization jobsmd
: Submit VASP AIMD jobsoptcheck
: Check SCF convergence for OPT jobsmdcheck
: Check SCF convergence for AIMD jobsdpdata
: Process data into npy or lmdb format for DeePMD or GNN trainingplot
: Plot element weights in the dataset
python flow.py PREFIX opt
The flow.py script reads parameters from an input
file in the same directory. The file should contain the following parameters:
work_path: /path/to/work/
opt_INCAR_path: /path/to/work/utils/INCAR_opt
md_INCAR_path: /path/to/work/utils/INCAR_md
vasp_sub_path: /path/to/work/utils/sub.vasp
structre_db_path: /path/to/work/structure_db/
max_jobs: 25 # Max jobs in slurm queue
sleep_time: 20 # The interval time for jobs checking in slurm queue
user_name: Your_username
step_data: 10 # Interval of data extraction by dpdata
test_size: 0.1 # Proportion of test dataset
dataset_prefix: /path/to/work/data
dataset_fmt: npy_single or npy_mix or lmdb
plot_name: element_weights.png
- VASP OPT Batch Submission :
- Submits VASP optimization jobs in batch.
- The initial optimization doesn't require full structural convergence, only SCF convergence.
- Check SCF Convergence for OPT :
- Checks SCF convergence for VASP optimization jobs.
- Run AIMD Simulations :
- Runs ab initio molecular dynamics simulations based on SCF-converged OPT results to generate data for collection
- Check SCF Convergence for AIMD :
- Checks SCF convergence for AIMD simulations and logs the number of SCF-converged steps.
- Process Data for DeepMD Training :
- Converts VASP results into npy or lmdb format for training neural network potentials. Note that only AIMD data are collected by an interval
- Plot Element Weights :
- Plots the occurrence of elements in the dataset as a colored periodic table.
Ensure all prerequisites are installed and properly configured.
Create an input
file in the project directory with the required parameters.
Use the command-line interface to perform various operations. For example, to submit optimization jobs, use:
python flow.py POSCAR opt
'POSCAR' is the PREFIX of the structure name in structure_db.
If the number of your POSCAR files in structure_db is large, usually you need to run this script by nohup
nohup python flow.py POSCAR opt &
After opt and optcheck, the AIMD jobs will be submited
nohup python flow.py POSCAR md &
Use optcheck
ormdcheck
operations to verify SCF convergence for OPT and MD jobs, respectively.
python flow.py POSCAR optcheck
python flow.py POSCAR mdcheck
After completing the VASP simulations, convert the results into a suitable format for following training:
python flow.py POSCAR dpdata
change the dataset_prefix: /path/to/work/data
and dataset_fmt: npy_single or npy_mix or lmdb
parameters to control which format to output.
Visualize the occurrence of elements in your dataset:
python flow.py POSCAR plot
- Ensure the VASP, VASPKIT and DeePMD-kit executables are accessible in your environment.
- Customize the
INCAR_OPT
,INCAR_MD
,KPOINTS
,POTCAR
, and submission scripts (sub.vasp
) as needed for your specific system and requirements.
This project is licensed under the GNU General Public License v3.0. See the LICENSE file for details.
- ASE: https://wiki.fysik.dtu.dk/ase/
- VASP: https://www.vasp.at/
- DeePMD-kit: https://github.com/deepmodeling/deepmd-kit
- dpdata: https://github.com/deepmodeling/dpdata
- VASPKIT: https://vaspkit.com/
- fairchem: https://github.com/FAIR-Chem/fairchem
- SLURM: https://slurm.schedmd.com/