[Badcase]: 相同的微调数据，Qwen1.5 14B准确率比Qwen2.5 14B高20%左右，这是什么原因 #1016

Jayc-Z · 2024-10-15T05:53:21Z

Model Series

Qwen2.5

What are the models used?

qwen2.5-14B-Instruct

What is the scenario where the problem happened?

Qwen2.5-14B-Instruct LoRA微调效果不好

Is this badcase known and can it be solved using avaiable techniques?

I have followed the GitHub README.
I have checked the Qwen documentation and cannot find a solution there.
I have checked the documentation of the related framework and cannot find useful information.
I have searched the issues and there is not a similar one.

Information about environment

任务简介：判断query的最小时间单位，方便环比同比的时间推理
例：
query：我要查询本月的每天电力智能通信网关的环比pr值
分析：query中的最小时间单位为天，因此unit=day，一句话中需要查询多个时间，因此"is_multi": "multi"
response：{"unit": "day", "is_multi": "multi"}
实验描述：用同样的1000条数据集，超参设置learning_rate: 0.00005，num_train_epochs: 24，Qwen1.5-14B-Chat准确率达到96%，但Qwen2.5-14B-Instruct只有76%，想请教一下为什么会退化这么多，微调时有什么需要注意的点。

Description

Steps to reproduce

This happens to Qwen2.5-xB-Instruct-xxx and xxx.
The badcase can be reproduced with the following steps:

...
...

The following example input & output can be used:

system: ...
user: ...
...

Expected results

The results are expected to be ...

Attempts to fix

I have tried several ways to fix this, including:

adjusting the sampling parameters, but ...
prompt engineering, but ...

Anything else helpful for investigation

I find that this problem also happens to ...

The text was updated successfully, but these errors were encountered:

LaoLiulaoliu · 2024-11-08T01:15:55Z

分别画一下loss曲线。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Badcase]: 相同的微调数据，Qwen1.5 14B准确率比Qwen2.5 14B高20%左右，这是什么原因 #1016

[Badcase]: 相同的微调数据，Qwen1.5 14B准确率比Qwen2.5 14B高20%左右，这是什么原因 #1016

Jayc-Z commented Oct 15, 2024

LaoLiulaoliu commented Nov 8, 2024

[Badcase]: 相同的微调数据，Qwen1.5 14B准确率比Qwen2.5 14B高20%左右，这是什么原因 #1016

[Badcase]: 相同的微调数据，Qwen1.5 14B准确率比Qwen2.5 14B高20%左右，这是什么原因 #1016

Comments

Jayc-Z commented Oct 15, 2024

Model Series

What are the models used?

What is the scenario where the problem happened?

Is this badcase known and can it be solved using avaiable techniques?

Information about environment

Description

Steps to reproduce

Expected results

Attempts to fix

Anything else helpful for investigation

LaoLiulaoliu commented Nov 8, 2024