Skip to content

Commit 95901cb

Browse files
authored
Fix qwen rl kl coeff (#10530)
1 parent 93e58f2 commit 95901cb

File tree

3 files changed

+3
-3
lines changed

3 files changed

+3
-3
lines changed

llm/config/qwen/grpo_32b_argument.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ ignore_save_lr_and_optim: true # Whether to ignore saving learning rate and opti
7373
disable_tqdm: true # Whether to disable tqdm progress bar
7474

7575
# RL args
76-
kl_coeff: 0.0 # KL coefficient
76+
kl_coeff: 0.001 # KL coefficient for PPO and Reinforce++
7777
kl_loss_coeff: 0.001 # KL loss coefficient
7878
pg_loss_coeff: 1.0 # Policy gradient loss coefficient
7979
entropy_coeff: 0.0 # Entropy coefficient

llm/config/qwen/grpo_argument.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ ignore_save_lr_and_optim: true # Whether to ignore saving learning rate and opti
7373
disable_tqdm: true # Whether to disable tqdm progress bar
7474

7575
# RL args
76-
kl_coeff: 0.0 # KL coefficient
76+
kl_coeff: 0.001 # KL coefficient for PPO and Reinforce++
7777
kl_loss_coeff: 0.001 # KL loss coefficient
7878
pg_loss_coeff: 1.0 # Policy gradient loss coefficient
7979
entropy_coeff: 0.0 # Entropy coefficient

llm/config/qwen/reinforce_plus_plus_argument.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ ignore_save_lr_and_optim: true # Whether to ignore saving learning rate and opti
7373
disable_tqdm: true # Whether to disable tqdm progress bar
7474

7575
# RL args
76-
kl_coeff: 0.0 # KL coefficient
76+
kl_coeff: 0.001 # KL coefficient for PPO and Reinforce++
7777
kl_loss_coeff: 0.000 # KL loss coefficient
7878
pg_loss_coeff: 1.0 # Policy gradient loss coefficient
7979
entropy_coeff: 0.0 # Entropy coefficient

0 commit comments

Comments
 (0)