只全参数微调Qwen2-VL-7B-Instruct的visual.merger部分，冻结其他模型参数，训练过程报错 #5472

wjx-sudo · 2024-09-18T12:22:48Z

Reminder

I have read the README and searched the existing issues.

System Info

model

model_name_or_path: /Qwen2-VL-7B-Instruct

method

stage: sft
do_train: true
finetuning_type: full
train_mm_proj_only: true #训练多模态投影器
deepspeed: examples/deepspeed/ds_z2_config.json

dataset

dataset: mllm_demo,identity
template: qwen2_vl
cutoff_len: 1024
max_samples: 1000
overwrite_cache: true

preprocessing_num_workers: 16

output

output_dir: saves/qwen2_vl-7b/full/sft
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true

train

per_device_train_batch_size: 1
gradient_accumulation_steps: 2
learning_rate: 1.0e-5
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000

eval

val_size: 0.1
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 500

Reproduction

File "/root/anaconda3/envs/llamafactory-w/lib/python3.8/site-packages/accelerate/accelerator.py", line 2143, in backward
self.deepspeed_engine_wrapped.backward(loss, **kwargs)
File "/root/anaconda3/envs/llamafactory-w/lib/python3.8/site-packages/accelerate/utils/deepspeed.py", line 166, in backward
self.engine.backward(loss, **kwargs)
File "/root/anaconda3/envs/llamafactory-w/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/root/anaconda3/envs/llamafactory-w/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1976, in backward
self.optimizer.backward(loss, retain_graph=retain_graph)
File "/root/anaconda3/envs/llamafactory-w/lib/python3.8/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 2051, in backward
self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
File "/root/anaconda3/envs/llamafactory-w/lib/python3.8/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
scaled_loss.backward(retain_graph=retain_graph)
File "/root/anaconda3/envs/llamafactory-w/lib/python3.8/site-packages/torch/_tensor.py", line 522, in backward
torch.autograd.backward(
File "/root/anaconda3/envs/llamafactory-w/lib/python3.8/site-packages/torch/autograd/init.py", line 266, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Expected behavior

No response

Others

No response

nemonameless · 2024-09-21T05:30:07Z

发现 finetuning_type: full 也不是100%都训

GoGoZeppeli-towa · 2024-09-21T12:07:12Z

发现 finetuning_type: full 也不是100%都训

或许是freeze_vision_tower默认为true的原因？

wjx-sudo · 2024-09-23T07:26:55Z

发现 finetuning_type: full 也不是100%都训

确实我尝试把vision_tower部分参数也加进去，训练过程中会卡住，只能微调llm部分

nemonameless · 2024-09-24T07:31:30Z

freeze_vision_tower 设置为true发现在自己数据集上训的不如false高

will-wiki · 2024-09-26T07:03:49Z

@wjx-sudo 同样的问题，非流式训练llm-lora+merger的时候，只训练一个step就卡主了，想问下你解决了吗

piDack · 2024-10-11T02:20:49Z

坐等好心人解决方案

Michael4933 · 2024-11-10T06:59:41Z

llama-factory对VIT和connector的训练支持似乎确实没做太好，好像就是不支持

github-actions bot added the pending This problem is yet to be addressed label Sep 18, 2024

Michael4933 mentioned this issue Nov 10, 2024

qwen2-vl全量微调时解冻Vit和merger/connect #5981

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

只全参数微调Qwen2-VL-7B-Instruct的visual.merger部分，冻结其他模型参数，训练过程报错 #5472

只全参数微调Qwen2-VL-7B-Instruct的visual.merger部分，冻结其他模型参数，训练过程报错 #5472

wjx-sudo commented Sep 18, 2024

nemonameless commented Sep 21, 2024

GoGoZeppeli-towa commented Sep 21, 2024

wjx-sudo commented Sep 23, 2024

nemonameless commented Sep 24, 2024

will-wiki commented Sep 26, 2024

piDack commented Oct 11, 2024

Michael4933 commented Nov 10, 2024

只全参数微调Qwen2-VL-7B-Instruct的visual.merger部分，冻结其他模型参数，训练过程报错 #5472

只全参数微调Qwen2-VL-7B-Instruct的visual.merger部分，冻结其他模型参数，训练过程报错 #5472

Comments

wjx-sudo commented Sep 18, 2024

Reminder

System Info

model

method

dataset

preprocessing_num_workers: 16

output

train

eval

Reproduction

Expected behavior

Others

nemonameless commented Sep 21, 2024

GoGoZeppeli-towa commented Sep 21, 2024

wjx-sudo commented Sep 23, 2024

nemonameless commented Sep 24, 2024

will-wiki commented Sep 26, 2024

piDack commented Oct 11, 2024

Michael4933 commented Nov 10, 2024