You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I use Qwen2.5 as the LM of the vision language model to perform SFT, but I find that under the same environment and command, the corresponding loss is different each time the same iteration is started. My seed is fixed. Is this normal? If not, how can I troubleshoot this unstable phenomenon?
Description
Steps to reproduce
This happens to Qwen2.5-xB-Instruct-xxx and xxx.
The badcase can be reproduced with the following steps:
...
...
The following example input & output can be used:
system: ...
user: ...
...
Expected results
The results are expected to be ...
Attempts to fix
I have tried several ways to fix this, including:
adjusting the sampling parameters, but ...
prompt engineering, but ...
Anything else helpful for investigation
I find that this problem also happens to ...
The text was updated successfully, but these errors were encountered:
If by unstable, you mean slight variations in losses across different runs. It is normal, because there are other sources of randomness than the pseudo-random number generator, which can be controlled by random seeds. See https://pytorch.org/docs/stable/notes/randomness.html for reference.
If by unstable, you mean that the loss fluctuates a lot. It is not expected, and there are so many things that can caused that.
@jklj077 Thanks for your reply😀.
In my research field, small models such as BART and T5 are commonly used.
When inserting these language models in my code, the losses across different runs do not change (same value), so I think the random seeds in my code are fixed well.
However, when I convert it to Qwen, the loss is the same in the first iteration, but in subsequent iterations, when lr is small (1e-5), the loss is only the same for some iterations, while other iterations have deviations of about 0.01~0.1, and when lr becomes larger (3e- 4), except for the first few iterations, the losses of subsequent iterations are different, probably with a deviation of more than 0.1.
I am curious about the cause of this phenomenon?
My code uses deepspeed's bf16 for training, not Trainer from the transformers library.
Model Series
Qwen2.5
What are the models used?
Qwen2.5-0.5B-Instruct
What is the scenario where the problem happened?
train Qwen2.5-0.5B-Instruct in transformers library for vision language model
Is this badcase known and can it be solved using avaiable techniques?
Information about environment
I use Qwen2.5 as the LM of the vision language model to perform SFT, but I find that under the same environment and command, the corresponding loss is different each time the same iteration is started. My seed is fixed. Is this normal? If not, how can I troubleshoot this unstable phenomenon?
Description
Steps to reproduce
This happens to Qwen2.5-xB-Instruct-xxx and xxx.
The badcase can be reproduced with the following steps:
The following example input & output can be used:
Expected results
The results are expected to be ...
Attempts to fix
I have tried several ways to fix this, including:
Anything else helpful for investigation
I find that this problem also happens to ...
The text was updated successfully, but these errors were encountered: