Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

warmup LR schedulers start from LR=0 #34754

Open
2 of 4 tasks
cfhammill opened this issue Nov 15, 2024 · 1 comment
Open
2 of 4 tasks

warmup LR schedulers start from LR=0 #34754

cfhammill opened this issue Nov 15, 2024 · 1 comment
Labels

Comments

@cfhammill
Copy link
Contributor

cfhammill commented Nov 15, 2024

System Info

transformers commit: 52ea4aa (main at time of writing)
the rest isn't relevant.

Who can help?

trainer: @muellerzr @SunMarc

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Run any example with a warmup scheduler, observe the the effective LR is 0 for the first step, unnecessarily wasting compute. See similar discussion for this issue on torchtune pytorch/torchtune#2010. See the code at

return float(current_step) / float(max(1, num_warmup_steps))

and evaluate for step 0. Observe it returns LR factor = 0, weights will not be updated.

Expected behavior

Expect every optimizer step to adjust the weights of my model unless there is a good reason not to.

@cfhammill cfhammill added the bug label Nov 15, 2024
@cfhammill
Copy link
Contributor Author

The proposed solution in the torchtune discussion is to add a min_lr argument with the default set to a small number (e.g. 1e-7), and return max(min_lr, computed_lr)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant