Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to set group_size in run_train_config? #191

Closed
scott-5 opened this issue Feb 18, 2024 · 2 comments · Fixed by #193
Closed

how to set group_size in run_train_config? #191

scott-5 opened this issue Feb 18, 2024 · 2 comments · Fixed by #193

Comments

@scott-5
Copy link

scott-5 commented Feb 18, 2024

As described in the doc, the step_configs/run_train_config/template_slice_config can be set, but it occur like that:

Traceback (most recent call last):
  File "/home/user/miniconda3/envs/dp/bin/dpgen2", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/user/miniconda3/envs/dp/lib/python3.11/site-packages/dpgen2/entrypoint/main.py", line 329, in main
    submit_concurrent_learning(
  File "/home/user/miniconda3/envs/dp/lib/python3.11/site-packages/dpgen2/entrypoint/submit.py", line 621, in submit_concurrent_learning
    dpgen_step, finetune_step = workflow_concurrent_learning(
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/envs/dp/lib/python3.11/site-packages/dpgen2/entrypoint/submit.py", line 399, in workflow_concurrent_learning
    concurrent_learning_op = make_concurrent_learning_op(
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/envs/dp/lib/python3.11/site-packages/dpgen2/entrypoint/submit.py", line 142, in make_concurrent_learning_op
    prep_run_train_op = PrepRunDPTrain(
                        ^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/envs/dp/lib/python3.11/site-packages/dpgen2/superop/prep_run_dp_train.py", line 187, in __init__
    self = _prep_run_dp_train(
           ^^^^^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/envs/dp/lib/python3.11/site-packages/dpgen2/superop/prep_run_dp_train.py", line 240, in _prep_run_dp_train
    prep_train = Step(
                 ^^^^^
TypeError: Step.__init__() got an unexpected keyword argument 'template_slice_config'
@scott-5 scott-5 changed the title how to train 4 model in one node? how to set group_size in run_train_config? Feb 18, 2024
@zjgemi
Copy link
Collaborator

zjgemi commented Feb 21, 2024

The issue of run_train_config missing template_slice_config has indeed been resolved in the PR. Also, please note that the exception you mentioned is raised at instantiation of the prep_train step. Make sure that the template_slice_config is passed to the run_train_config, rather than prep_train_config. The prep_train step is a non-sliced step and not supposed to receive template_slice_config.

@wanghan-iapcm wanghan-iapcm linked a pull request Feb 21, 2024 that will close this issue
@scott-5
Copy link
Author

scott-5 commented Feb 21, 2024

Okey. Thank you so much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants