Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: No system available during deepks model training. #68

Open
Shangguanying992 opened this issue Jan 23, 2024 · 1 comment
Open

Comments

@Shangguanying992
Copy link

Shangguanying992 commented Jan 23, 2024

While running the deepks model, I encountered the following error during the iteration process:
err.iter:
#data_train/group.00 no system.raw, infer meta from data
#data_train/group.00 reset batch size to 0
#ignore empty dataset: data_train/group.00
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/home/ljcgroup/.conda/envs/deepks/lib/python3.11/site-packages/deepks/model/train.py", line 303, in
cli()
File "/home/ljcgroup/.conda/envs/deepks/lib/python3.11/site-packages/deepks/main.py", line 71, in train_cli
main(**argdict)
File "/home/ljcgroup/.conda/envs/deepks/lib/python3.11/site-packages/deepks/model/train.py", line 270, in main
g_reader = GroupReader(train_paths, **data_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ljcgroup/.conda/envs/deepks/lib/python3.11/site-packages/deepks/model/reader.py", line 207, in init
raise RuntimeError("No system is available")
RuntimeError: No system is available

Upon inspecting the iter.00/00.scf/log.data file, I noticed that none of the systems converged, leading to a lack of information for training. The content of log.data is as follows:
Training:
Convergence:
0 / 200 = 0.00000
Energy:
ME: 4.768164229534069
MAE: 8.715617244519633
MARE: 7.972705559173027
Force:
MAE: 1.0896364501030267
Testing:
Convergence:
0 / 200 = 0.00000
Energy:
ME: 4.768164229534069
MAE: 8.715617244519633
MARE: 7.972705559173027
Force:
MAE: 1.0896364501030267
I have verified that the system configurations are reasonable. Currently, another user also encountered the same problem. Any guidance on resolving this issue would be appreciated.

Environment Information: The water_single example is functioning correctly.
Thank you for your assistance!

@y1xiaoc
Copy link
Collaborator

y1xiaoc commented Jan 24, 2024

The main problem is no configuration is converged. I would try train with smaller learning rate and fewer steps to see if the convergence rate can be larger than 0.9. If that rate is too low it is hard to learning anything from the unconverged data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants