|
| 1 | +--- |
| 2 | +title: 微调Qwen2.5 VL-7B-Instruct |
| 3 | +date: 2025-04-10 |
| 4 | +tags: |
| 5 | + - LLM |
| 6 | + - finetune |
| 7 | +categories: |
| 8 | + - AI |
| 9 | + - 模型训练 |
| 10 | +--- |
| 11 | + |
| 12 | +## 简介 |
| 13 | + |
| 14 | +记录训练通义千问Qwen2.5 VL-7B-Instruct模型Lora的一些主要步骤和命令。 |
| 15 | + |
| 16 | +## 环境准备 |
| 17 | + |
| 18 | +租用AutoDL服务器,远程ssh。 |
| 19 | + |
| 20 | +### 硬件配置 |
| 21 | + |
| 22 | +- GPU: H20-NVLink(96GB) * 1 |
| 23 | +- CPU: 20 vCPU Intel(R) Xeon(R) Platinum 8457C |
| 24 | +- RAM: 200GB |
| 25 | +- 费用 ¥7.98/时 |
| 26 | + |
| 27 | +### 软件依赖 |
| 28 | + |
| 29 | +- PyTorch 2.5.1 |
| 30 | +- Python 3.12(ubuntu22.04) |
| 31 | +- CUDA 12.4 |
| 32 | + |
| 33 | +### ssh配置 |
| 34 | + |
| 35 | +配置端口映射和ssh私钥登陆, 将swift/tensorboard/vllm/ollama等选择性映射到本机 |
| 36 | + |
| 37 | +```text |
| 38 | +Host autodl |
| 39 | + HostName xxxx |
| 40 | + User root |
| 41 | + Port xxxxx |
| 42 | + IdentityFile ~/.ssh/xxxxx |
| 43 | + LocalForward 0.0.0.0:7860 localhost:7860 |
| 44 | + LocalForward 0.0.0.0:6006 localhost:6006 |
| 45 | + LocalForward 0.0.0.0:8000 localhost:8000 |
| 46 | + LocalForward 0.0.0.0:11434 localhost:11434 |
| 47 | +``` |
| 48 | + |
| 49 | +### 软件安装 |
| 50 | + |
| 51 | +#### modelscope |
| 52 | + |
| 53 | +首先安装tmux和modelscope,然后马上进去tmux后台下载基座模型,这个需要一段比较长时间 |
| 54 | + |
| 55 | +```shell |
| 56 | +apt update && apt install -y tmux |
| 57 | + |
| 58 | +# 进入tmux |
| 59 | +pip install modelscope |
| 60 | +modelscope download --model Qwen/Qwen2.5-VL-7B-Instruct |
| 61 | +``` |
| 62 | + |
| 63 | +#### ms-swift |
| 64 | + |
| 65 | +准备训练用工具swift, 训练过程依赖flash attention2, 也预先安装好. flash-attn的安装可能会卡,注意翻。 |
| 66 | + |
| 67 | +```shell |
| 68 | +pip install ms-swift |
| 69 | +pip install flash-attn --no-build-isolation |
| 70 | +# 准备好,启动WebUI |
| 71 | +swift web-ui --lang zh |
| 72 | +``` |
| 73 | + |
| 74 | +## 数据准备 |
| 75 | + |
| 76 | +### 数据预处理 |
| 77 | + |
| 78 | +数据只需要一个jsonl文件准备好对话内容即可,内容格式如下: |
| 79 | + |
| 80 | +```json |
| 81 | +{"messages": [{"role": "user", "content": "<image>请描述这个图片"}, {"role": "assistant", "content": "这个图片描述的是……"}], "images": "image-path.jpg"} |
| 82 | +``` |
| 83 | + |
| 84 | +图片可能需要resize & crop,用ffmpeg处理一下: |
| 85 | + |
| 86 | +```shell |
| 87 | +find input -name "*.jpg" -exec bash -c 'ffmpeg -i {} -vf "scale=iw*0.5:ih*0.5,crop=500:500:0:0" -y output/$(basename {})' \; |
| 88 | +``` |
| 89 | + |
| 90 | +### 数据标注 |
| 91 | + |
| 92 | +这一步省了。如果要做,可以用label studio。 |
| 93 | + |
| 94 | +## 训练配置 |
| 95 | + |
| 96 | +### LoRA参数设置 |
| 97 | + |
| 98 | +基本上是swift的默认值 |
| 99 | + |
| 100 | +- rank: 8 |
| 101 | +- alpha: 32 |
| 102 | +- dropout: 0.05 |
| 103 | +- target modules: all-linear |
| 104 | + |
| 105 | +### 训练超参数 |
| 106 | + |
| 107 | +训练时batch size为1用了89G VRAM,没尝试batch size=2是否可跑起来 |
| 108 | + |
| 109 | +- batch size: 1 |
| 110 | +- learning rate: 1e-4 |
| 111 | +- epochs: 1000 |
| 112 | +- warmup steps: 0 |
| 113 | +- gradient accumulation: 16 |
| 114 | +- save_steps: 100 |
| 115 | + |
| 116 | +### 命令行 |
| 117 | + |
| 118 | +```shell |
| 119 | + |
| 120 | +swift sft \ |
| 121 | + --torch_dtype bfloat16 \ |
| 122 | + --model Qwen/Qwen2.5-VL-7B-Instruct \ |
| 123 | + --model_type qwen2_5_vl \ |
| 124 | + --template qwen2_5_vl \ |
| 125 | + --system You are a helpful assistant. \ |
| 126 | + --dataset /root/dataset/train.jsonl \ |
| 127 | + --max_length 1024 \ |
| 128 | + --init_weights True \ |
| 129 | + --learning_rate 1e-4 \ |
| 130 | + --num_train_epochs 1000 \ |
| 131 | + --attn_impl flash_attn \ |
| 132 | + --gradient_accumulation_steps 16 \ |
| 133 | + --eval_steps 500 \ |
| 134 | + --save_steps 100 \ |
| 135 | + --output_dir /root/output \ |
| 136 | + --report_to tensorboard \ |
| 137 | + --add_version False \ |
| 138 | + --output_dir /root/output/v0-20250411-021343 \ |
| 139 | + --logging_dir /root/output/v0-20250411-021343/runs \ |
| 140 | + --ignore_args_error True |
| 141 | +``` |
| 142 | + |
| 143 | +## 评估 |
| 144 | + |
| 145 | +### vllm |
| 146 | + |
| 147 | +本来想用vllm提供接口,然后用Page Assist来调用,但是后面有其他问题放弃了,这里仅记录启动vllm的命令。 |
| 148 | + |
| 149 | +```shell |
| 150 | +vllm serve \ |
| 151 | + ~/.cache/modelscope/hub/models/Qwen/Qwen2.5-VL-7B-Instruct \ |
| 152 | + --enable-lora \ |
| 153 | + --lora-modules lora=/path/to/checkpoint |
| 154 | +``` |
| 155 | + |
| 156 | +### transformers |
| 157 | + |
| 158 | +尝试了transformers貌似无法用peft加载qwen-vl的lora,在用lora包裹model那一步会报错,这里仅给出不含lora的调用代码: |
| 159 | + |
| 160 | +```python |
| 161 | +from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor |
| 162 | +from qwen_vl_utils import process_vision_info |
| 163 | + |
| 164 | +model_id_or_path = '~/.cache/modelscope/hub/models/Qwen/Qwen2.5-VL-7B-Instruct' |
| 165 | + |
| 166 | +# default: Load the model on the available device(s) |
| 167 | +model = Qwen2_5_VLForConditionalGeneration.from_pretrained( |
| 168 | + model_id_or_path, torch_dtype="auto", device_map="auto" |
| 169 | +) |
| 170 | + |
| 171 | +# default processor |
| 172 | +processor = AutoProcessor.from_pretrained(model_id_or_path) |
| 173 | +# TODO: use your own messages variable |
| 174 | +text = processor.apply_chat_template( |
| 175 | + messages, tokenize=False, add_generation_prompt=True |
| 176 | +) |
| 177 | +image_inputs, video_inputs = process_vision_info(messages) |
| 178 | +inputs = processor( |
| 179 | + text=[text], |
| 180 | + images=image_inputs, |
| 181 | + videos=video_inputs, |
| 182 | + padding=True, |
| 183 | + return_tensors="pt", |
| 184 | +) |
| 185 | +inputs = inputs.to(model.device) |
| 186 | + |
| 187 | +# Inference: Generation of the output |
| 188 | +generated_ids = model.generate(**inputs, max_new_tokens=128) |
| 189 | +generated_ids_trimmed = [ |
| 190 | + out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids) |
| 191 | +] |
| 192 | +output_text = processor.batch_decode( |
| 193 | + generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False |
| 194 | +) |
| 195 | +``` |
| 196 | + |
| 197 | +### ms-swift |
| 198 | + |
| 199 | +代码参考:https://github.com/modelscope/ms-swift/blob/main/examples/infer/demo.py |
| 200 | + |
| 201 | +## 经验总结 |
| 202 | + |
| 203 | +1. 训练样本太少,正负各5个,泛化能力不足 |
| 204 | +2. 速度为115s/it,接近6.5小时跑了200个epochs |
| 205 | +3. 大概在20+epochs后,train acc达到顶峰,175+ epochs后,训练集loss比较平缓 |
0 commit comments