Skip to content

一定要使用bfloat16类型吗 #761

Closed
@awakenlee180

Description

@awakenlee180

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

作者您好,请问在finetune时候可以不使用bfloat16类型吗,我在finetune脚本中将bfloat16设为false但是运行的时候还是会提示我相关的报错,我的gpu类型是32g v100,因此想换成float16这是可行的吗,感觉好像不太可行,因为我把代码中所有bfloat16换成float16之后,就会有其他报错:

Original Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/pin_memory.py", line 34, in do_one_step
    data = pin_memory(data, device)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/pin_memory.py", line 60, in pin_memory
    return type(data)({k: pin_memory(sample, device) for k, sample in data.items()})  # type: ignore[call-arg]
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/pin_memory.py", line 60, in <dictcomp>
    return type(data)({k: pin_memory(sample, device) for k, sample in data.items()})  # type: ignore[call-arg]
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/pin_memory.py", line 55, in pin_memory
    return data.pin_memory(device)
RuntimeError: cannot pin 'torch.cuda.HalfTensor' only dense CPU tensors can be pinned

Reproduction

GPUS=2 PER_DEVICE_BATCH_SIZE=1 sh shell/internvl2.0/2nd_finetune/internvl2_1b_qwen2_0_5b_dynamic_res_2nd_finetune_lora.sh

Environment

python 3.10 
2张 v100 32g

Error traceback

Original Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/pin_memory.py", line 34, in do_one_step
    data = pin_memory(data, device)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/pin_memory.py", line 60, in pin_memory
    return type(data)({k: pin_memory(sample, device) for k, sample in data.items()})  # type: ignore[call-arg]
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/pin_memory.py", line 60, in <dictcomp>
    return type(data)({k: pin_memory(sample, device) for k, sample in data.items()})  # type: ignore[call-arg]
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/pin_memory.py", line 55, in pin_memory
    return data.pin_memory(device)
RuntimeError: cannot pin 'torch.cuda.HalfTensor' only dense CPU tensors can be pinned

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions