Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Badcase]: Different model parameter amounts have different effects on the batch pad. #1165

Open
4 tasks done
791136190 opened this issue Jan 16, 2025 · 1 comment
Open
4 tasks done

Comments

@791136190
Copy link

Model Series

Qwen2.5

What are the models used?

Qwen 2.5 0.5B and 7B Instruct

What is the scenario where the problem happened?

HF Demo

Is this badcase known and can it be solved using avaiable techniques?

  • I have followed the GitHub README.
  • I have checked the Qwen documentation and cannot find a solution there.
  • I have checked the documentation of the related framework and cannot find useful information.
  • I have searched the issues and there is not a similar one.

Information about environment

Ubuntu+torch+hf

Description

当我使用batch推理和非batch推理时,如果有pad发生。不同模型参数量的时候,pad的影响不一致。

0.5B:
1 batch:Q ["你好"], A["你好!有什么我能帮助你的吗?"]
2 batch:Q["你好",“你是谁”],A["你好!很高兴为您服务。有什么我能帮助您的吗?","我是Qwen,一个由阿里云开发的超大规模语言模型。我被设计来帮助用户生成、分析和解释各种类型的文本内容,包括但不限于自然语言处理、机器翻译、问答系统等。我的目标是通过提供准确、有用和有意义的回答,为用户提供有价值的信息和建议。如果您有任何问题或需要帮助,请随时提问!"]

7B:
1 batch:Q ["你好"], A["你好!有什么我能帮助你的吗?"]
2 batch:Q["你好",“你是谁”],A["你好!有什么我可以帮助你的吗?","我是Qwen,一个由阿里云开发的超大规模语言模型。我被设计来帮助用户生成、分析和解释各种类型的文本内容,包括但不限于自然语言处理、机器翻译、问答系统等。我的目标是通过提供准确、有用和有意义的回答,为用户提供有价值的信息和建议。如果您有任何问题或需要帮助,请随时提问!"]

pad为left,pad mask为-inf
在7B模型时,组batch带pad与否不影响结果。但是0.5B组batch待pad的时候,两种情况结果不一致。

@jklj077
Copy link
Collaborator

jklj077 commented Jan 20, 2025

provide steps to reproduce

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants