warmup LR schedulers start from LR=0 #34754

cfhammill · 2024-11-15T16:35:39Z

System Info

transformers commit: 52ea4aa (main at time of writing)
the rest isn't relevant.

Who can help?

trainer: @muellerzr @SunMarc

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Run any example with a warmup scheduler, observe the the effective LR is 0 for the first step, unnecessarily wasting compute. See similar discussion for this issue on torchtune pytorch/torchtune#2010. See the code at

transformers/src/transformers/optimization.py

Line 182 in 52ea4aa

return float(current_step) / float(max(1, num_warmup_steps))

and evaluate for step 0. Observe it returns LR factor = 0, weights will not be updated.

Expected behavior

Expect every optimizer step to adjust the weights of my model unless there is a good reason not to.

The text was updated successfully, but these errors were encountered:

cfhammill · 2024-11-15T16:38:14Z

The proposed solution in the torchtune discussion is to add a min_lr argument with the default set to a small number (e.g. 1e-7), and return max(min_lr, computed_lr)

cfhammill added the bug label Nov 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

warmup LR schedulers start from LR=0 #34754

warmup LR schedulers start from LR=0 #34754

cfhammill commented Nov 15, 2024 •

edited

Loading

cfhammill commented Nov 15, 2024

warmup LR schedulers start from LR=0 #34754

warmup LR schedulers start from LR=0 #34754

Comments

cfhammill commented Nov 15, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

cfhammill commented Nov 15, 2024

cfhammill commented Nov 15, 2024 •

edited

Loading