[Badcase]: loss unstable #1074

Solo4working opened this issue Nov 12, 2024 · 4 comments
Model Series


What are the models used?


What is the scenario where the problem happened?

train Qwen2.5-0.5B-Instruct in transformers library for vision language model

Is this badcase known and can it be solved using avaiable techniques?

  • I have followed the GitHub README.
  • I have checked the Qwen documentation and cannot find a solution there.
  • I have checked the documentation of the related framework and cannot find useful information.
  • I have searched the issues and there is not a similar one.

Information about environment

Package                      Version
---------------------------- ------------
absl-py                      2.0.0
accelerate                   0.29.2
annotated-types              0.6.0
antlr4-python3-runtime       4.9.3
anyio                        4.2.0
appdirs                      1.4.4
argon2-cffi                  23.1.0
argon2-cffi-bindings         21.2.0
arrow                        1.3.0
asttokens                    2.4.1
astunparse                   1.6.3
async-lru                    2.0.4
attrdict                     2.0.1
attrs                        23.1.0
Babel                        2.14.0
beautifulsoup4               4.12.2
bitsandbytes                 0.43.1
bleach                       6.1.0
BLEURT                       0.0.2
blis                         0.7.11
cachetools                   5.3.2
catalogue                    2.0.10
certifi                      2023.11.17
cffi                         1.16.0
charset-normalizer           3.3.2
click                        8.1.7
cloudpathlib                 0.16.0
colorama                     0.4.6
comm                         0.2.0
confection                   0.1.4
ctcdecode                    1.0.3
cycler                       0.12.1
cymem                        2.0.8
de-core-news-sm              3.7.0
debugpy                      1.8.0
decorator                    4.4.2
decord                       0.6.0
deepspeed                    0.13.1
defusedxml                   0.7.1
dill                         0.3.8
docker-pycreds               0.4.0
einops                       0.5.0
et_xmlfile                   2.0.0
exceptiongroup               1.2.0
executing                    2.0.1
fastjsonschema               2.19.0
filelock                     3.13.1
flash-attn                   2.5.7
flatbuffers                  1.12
fonttools                    4.50.0
fqdn                         1.5.1
fsspec                       2023.12.2
ftfy                         6.3.1
gast                         0.4.0
gitdb                        4.0.11
GitPython                    3.1.43
google-auth                  2.25.2
google-auth-oauthlib         0.4.6
google-pasta                 0.2.0
grpcio                       1.60.0
h11                          0.14.0
h5py                         3.10.0
hjson                        3.1.0
hpargparse                   0.14.0
hpman                        0.13.0
httpcore                     1.0.5
httpx                        0.27.0
huggingface-hub              0.22.2
hydra-core                   1.3.2
idna                         3.6
imageio                      2.34.0
imageio-ffmpeg               0.5.1
importlib-metadata           7.0.0
ipykernel                    6.27.1
ipython                      8.18.1
ipython-genutils             0.2.0
isoduration                  20.11.0
jedi                         0.19.1
Jinja2                       3.1.2
joblib                       1.3.2
json5                        0.9.14
jsonpointer                  2.4
jsonschema                   4.20.0
jsonschema-specifications    2023.11.2
jupyter_client               8.6.0
jupyter_core                 5.5.1
jupyter-events               0.9.0
jupyter-lsp                  2.2.1
jupyter_server               2.12.1
jupyter_server_terminals     0.5.0
jupyterlab                   4.0.9
jupyterlab_pygments          0.3.0
jupyterlab_server            2.25.2
keras                        2.9.0
Keras-Preprocessing          1.1.2
kiwisolver                   1.4.5
langcodes                    3.3.0
libclang                     18.1.1
lmdb                         1.3.0
loguru                       0.7.0
lxml                         4.9.1
Markdown                     3.5.1
markdown-it-py               3.0.0
MarkupSafe                   2.1.3
matplotlib                   3.5.3
matplotlib-inline            0.1.6
mdurl                        0.1.2
mistune                      3.0.2
moviepy                      1.0.3
mpmath                       1.3.0
murmurhash                   1.0.10
nbclassic                    1.0.0
nbclient                     0.9.0
nbconvert                    7.13.0
nbformat                     5.9.2
nest-asyncio                 1.5.8
networkx                     3.2.1
nose                         1.3.7
notebook                     6.5.4
notebook_shim                0.2.3
numpy                        1.22.4
nvidia-ml-py                 12.535.133
nvitop                       1.3.2
oauthlib                     3.2.2
omegaconf                    2.3.0
openpyxl                     3.1.5
opt-einsum                   3.3.0
ordereddict                  1.1
overrides                    7.4.0
packaging                    23.2
pandas                       1.3.5
pandocfilters                1.5.0
parso                        0.8.3
pathlib                      1.0.1
pathtools                    0.1.2
peft                         0.10.0
pexpect                      4.9.0
Pillow                       10.1.0
pip                          23.3.2
platformdirs                 4.1.0
portalocker                  2.8.2
preshed                      3.0.9
proglog                      0.1.10
prometheus-client            0.19.0
promise                      2.3
prompt-toolkit               3.0.43
protobuf                     3.19.6
psutil                       5.9.8
ptyprocess                   0.7.0
pure-eval                    0.2.2
py-cpuinfo                   9.0.0
pyasn1                       0.5.1
pyasn1-modules               0.3.0
pycparser                    2.21
pydantic                     2.6.4
pydantic_core                2.16.3
Pygments                     2.17.2
PyJWT                        2.8.0
pynvml                       11.5.3
pyparsing                    3.1.2
python-dateutil              2.8.2
python-json-logger           2.0.7
pytz                         2024.1
PyWavelets                   1.6.0
PyYAML                       6.0.1
pyzmq                        25.1.2
referencing                  0.32.0
reFILE                       0.4.1
regex                        2023.12.25
requests                     2.31.0
requests-oauthlib            1.3.1
rfc3339-validator            0.1.4
rfc3986-validator            0.1.1
rich                         13.7.1
rouge                        1.0.1
rpds-py                      0.15.2
rsa                          4.9
sacrebleu                    2.2.0
sacremoses                   0.1.1
safetensors                  0.4.2
scikit-image                 0.19.3
scikit-learn                 1.0.2
scipy                        1.7.3
seaborn                      0.12.1
Send2Trash                   1.8.2
sentencepiece                0.1.97
sentry-sdk                   1.44.0
setproctitle                 1.3.3
setuptools                   58.1.0
shortuuid                    1.0.13
six                          1.16.0
smart-open                   6.0.0
smmap                        5.0.1
sniffio                      1.3.0
soupsieve                    2.5
spacy                        3.7.4
spacy-legacy                 3.0.12
spacy-loggers                1.0.5
spacy-pkuseg                 0.0.33
srsly                        2.4.8
stack-data                   0.6.3
sympy                        1.12
tabulate                     0.9.0
tensorboard                  2.9.1
tensorboard-data-server      0.6.1
tensorboard-plugin-wit       1.8.1
tensorflow                   2.9.1
tensorflow-estimator         2.9.0
tensorflow-io-gcs-filesystem 0.36.0
termcolor                    2.4.0
terminado                    0.18.0
textpruner                   1.1.post2
tf-slim                      1.1.0
thinc                        8.2.3
threadpoolctl                3.4.0
tifffile                     2024.2.12
timm                         0.8.10.dev0
tinycss2                     1.2.1
tokenizers                   0.19.1
tomli                        2.0.1
torch                        2.1.1+cu121
torchaudio                   2.1.1+cu121
torchvision                  0.16.1+cu121
tornado                      6.4
tqdm                         4.65.0
traitlets                    5.14.0
transformers                 4.40.0
triton                       2.1.0
typer                        0.9.4
typing_extensions            4.9.0
uri-template                 1.3.0
urllib3                      2.1.0
vidaug                       1.5
wandb                        0.16.5
wasabi                       1.1.2
wcwidth                      0.2.12
weasel                       0.3.4
webcolors                    1.13
webencodings                 0.5.1
websocket-client             1.7.0
Werkzeug                     3.0.1
wheel                        0.43.0
wrapt                        1.16.0
zh-core-web-sm               3.7.0
zhipuai                      2.0.1
zipp                         3.17.0

I use Qwen2.5 as the LM of the vision language model to perform SFT, but I find that under the same environment and command, the corresponding loss is different each time the same iteration is started. My seed is fixed. Is this normal? If not, how can I troubleshoot this unstable phenomenon?


Steps to reproduce

This happens to Qwen2.5-xB-Instruct-xxx and xxx.
The badcase can be reproduced with the following steps:

  1. ...
  2. ...

The following example input & output can be used:

system: ...
user: ...

Expected results

The results are expected to be ...

Attempts to fix

I have tried several ways to fix this, including:

  1. adjusting the sampling parameters, but ...
  2. prompt engineering, but ...

Anything else helpful for investigation

I find that this problem also happens to ...

jklj077 commented Nov 14, 2024

If by unstable, you mean slight variations in losses across different runs. It is normal, because there are other sources of randomness than the pseudo-random number generator, which can be controlled by random seeds. See for reference.

If by unstable, you mean that the loss fluctuates a lot. It is not expected, and there are so many things that can caused that.

Solo4working commented Nov 15, 2024

@jklj077 Thanks for your reply😀.
In my research field, small models such as BART and T5 are commonly used.
When inserting these language models in my code, the losses across different runs do not change (same value), so I think the random seeds in my code are fixed well.
However, when I convert it to Qwen, the loss is the same in the first iteration, but in subsequent iterations, when lr is small (1e-5), the loss is only the same for some iterations, while other iterations have deviations of about 0.01~0.1, and when lr becomes larger (3e- 4), except for the first few iterations, the losses of subsequent iterations are different, probably with a deviation of more than 0.1.
I am curious about the cause of this phenomenon?
My code uses deepspeed's bf16 for training, not Trainer from the transformers library.

jklj077 commented Nov 19, 2024

There are other sources of randomness than the pseudo-random number generator, which can be controlled by random seeds. See for reference. The background info on accuracy problems on floating-point numbers can be found at

Since you are using bfloat16 which has reduced precision, the error resulted from nondeterministic-algorithms will be more prominent.

There are other sources of randomness than the pseudo-random number generator, which can be controlled by random seeds. See for reference. The background info on accuracy problems on floating-point numbers can be found at

Since you are using bfloat16 which has reduced precision, the error resulted from nondeterministic-algorithms will be more prominent.

Thanks for your reply, I think I found a way to deal with it.

In transformers, the Qwen series models use the sdpa attention mechanism by default, while the T5 series models mostly use the eager attention mechanism by default.

When I switch the Qwen attention mechanism back to ‘eager’, the loss can remain stable (without any randomness). If the sdpa or flash_attn_2 attention mechanisms are used, transformers.enable_full_determinism needs to be used to fix this randomness.

This may only be a temporary solution, because reproducibility is very important to me.

