Skip to content

Commit bb243e4

Browse files
committed
save
1 parent 13410c6 commit bb243e4

File tree

3 files changed

+205
-0
lines changed

3 files changed

+205
-0
lines changed
File renamed without changes.
File renamed without changes.

Diff for: source/_posts/2025-04-10-qwen2.5-vl-finetune.md

+205
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,205 @@
1+
---
2+
title: 微调Qwen2.5 VL-7B-Instruct
3+
date: 2025-04-10
4+
tags:
5+
- LLM
6+
- finetune
7+
categories:
8+
- AI
9+
- 模型训练
10+
---
11+
12+
## 简介
13+
14+
记录训练通义千问Qwen2.5 VL-7B-Instruct模型Lora的一些主要步骤和命令。
15+
16+
## 环境准备
17+
18+
租用AutoDL服务器,远程ssh。
19+
20+
### 硬件配置
21+
22+
- GPU: H20-NVLink(96GB) * 1
23+
- CPU: 20 vCPU Intel(R) Xeon(R) Platinum 8457C
24+
- RAM: 200GB
25+
- 费用 ¥7.98/时
26+
27+
### 软件依赖
28+
29+
- PyTorch 2.5.1
30+
- Python 3.12(ubuntu22.04)
31+
- CUDA 12.4
32+
33+
### ssh配置
34+
35+
配置端口映射和ssh私钥登陆, 将swift/tensorboard/vllm/ollama等选择性映射到本机
36+
37+
```text
38+
Host autodl
39+
HostName xxxx
40+
User root
41+
Port xxxxx
42+
IdentityFile ~/.ssh/xxxxx
43+
LocalForward 0.0.0.0:7860 localhost:7860
44+
LocalForward 0.0.0.0:6006 localhost:6006
45+
LocalForward 0.0.0.0:8000 localhost:8000
46+
LocalForward 0.0.0.0:11434 localhost:11434
47+
```
48+
49+
### 软件安装
50+
51+
#### modelscope
52+
53+
首先安装tmux和modelscope,然后马上进去tmux后台下载基座模型,这个需要一段比较长时间
54+
55+
```shell
56+
apt update && apt install -y tmux
57+
58+
# 进入tmux
59+
pip install modelscope
60+
modelscope download --model Qwen/Qwen2.5-VL-7B-Instruct
61+
```
62+
63+
#### ms-swift
64+
65+
准备训练用工具swift, 训练过程依赖flash attention2, 也预先安装好. flash-attn的安装可能会卡,注意翻。
66+
67+
```shell
68+
pip install ms-swift
69+
pip install flash-attn --no-build-isolation
70+
# 准备好,启动WebUI
71+
swift web-ui --lang zh
72+
```
73+
74+
## 数据准备
75+
76+
### 数据预处理
77+
78+
数据只需要一个jsonl文件准备好对话内容即可,内容格式如下:
79+
80+
```json
81+
{"messages": [{"role": "user", "content": "<image>请描述这个图片"}, {"role": "assistant", "content": "这个图片描述的是……"}], "images": "image-path.jpg"}
82+
```
83+
84+
图片可能需要resize & crop,用ffmpeg处理一下:
85+
86+
```shell
87+
find input -name "*.jpg" -exec bash -c 'ffmpeg -i {} -vf "scale=iw*0.5:ih*0.5,crop=500:500:0:0" -y output/$(basename {})' \;
88+
```
89+
90+
### 数据标注
91+
92+
这一步省了。如果要做,可以用label studio。
93+
94+
## 训练配置
95+
96+
### LoRA参数设置
97+
98+
基本上是swift的默认值
99+
100+
- rank: 8
101+
- alpha: 32
102+
- dropout: 0.05
103+
- target modules: all-linear
104+
105+
### 训练超参数
106+
107+
训练时batch size为1用了89G VRAM,没尝试batch size=2是否可跑起来
108+
109+
- batch size: 1
110+
- learning rate: 1e-4
111+
- epochs: 1000
112+
- warmup steps: 0
113+
- gradient accumulation: 16
114+
- save_steps: 100
115+
116+
### 命令行
117+
118+
```shell
119+
120+
swift sft \
121+
--torch_dtype bfloat16 \
122+
--model Qwen/Qwen2.5-VL-7B-Instruct \
123+
--model_type qwen2_5_vl \
124+
--template qwen2_5_vl \
125+
--system You are a helpful assistant. \
126+
--dataset /root/dataset/train.jsonl \
127+
--max_length 1024 \
128+
--init_weights True \
129+
--learning_rate 1e-4 \
130+
--num_train_epochs 1000 \
131+
--attn_impl flash_attn \
132+
--gradient_accumulation_steps 16 \
133+
--eval_steps 500 \
134+
--save_steps 100 \
135+
--output_dir /root/output \
136+
--report_to tensorboard \
137+
--add_version False \
138+
--output_dir /root/output/v0-20250411-021343 \
139+
--logging_dir /root/output/v0-20250411-021343/runs \
140+
--ignore_args_error True
141+
```
142+
143+
## 评估
144+
145+
### vllm
146+
147+
本来想用vllm提供接口,然后用Page Assist来调用,但是后面有其他问题放弃了,这里仅记录启动vllm的命令。
148+
149+
```shell
150+
vllm serve \
151+
~/.cache/modelscope/hub/models/Qwen/Qwen2.5-VL-7B-Instruct \
152+
--enable-lora \
153+
--lora-modules lora=/path/to/checkpoint
154+
```
155+
156+
### transformers
157+
158+
尝试了transformers貌似无法用peft加载qwen-vl的lora,在用lora包裹model那一步会报错,这里仅给出不含lora的调用代码:
159+
160+
```python
161+
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
162+
from qwen_vl_utils import process_vision_info
163+
164+
model_id_or_path = '~/.cache/modelscope/hub/models/Qwen/Qwen2.5-VL-7B-Instruct'
165+
166+
# default: Load the model on the available device(s)
167+
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
168+
model_id_or_path, torch_dtype="auto", device_map="auto"
169+
)
170+
171+
# default processor
172+
processor = AutoProcessor.from_pretrained(model_id_or_path)
173+
# TODO: use your own messages variable
174+
text = processor.apply_chat_template(
175+
messages, tokenize=False, add_generation_prompt=True
176+
)
177+
image_inputs, video_inputs = process_vision_info(messages)
178+
inputs = processor(
179+
text=[text],
180+
images=image_inputs,
181+
videos=video_inputs,
182+
padding=True,
183+
return_tensors="pt",
184+
)
185+
inputs = inputs.to(model.device)
186+
187+
# Inference: Generation of the output
188+
generated_ids = model.generate(**inputs, max_new_tokens=128)
189+
generated_ids_trimmed = [
190+
out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
191+
]
192+
output_text = processor.batch_decode(
193+
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
194+
)
195+
```
196+
197+
### ms-swift
198+
199+
代码参考:https://github.com/modelscope/ms-swift/blob/main/examples/infer/demo.py
200+
201+
## 经验总结
202+
203+
1. 训练样本太少,正负各5个,泛化能力不足
204+
2. 速度为115s/it,接近6.5小时跑了200个epochs
205+
3. 大概在20+epochs后,train acc达到顶峰,175+ epochs后,训练集loss比较平缓

0 commit comments

Comments
 (0)