Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support turbomind head_dim 64 #2715

Merged
merged 6 commits into from
Nov 6, 2024
Merged

support turbomind head_dim 64 #2715

merged 6 commits into from
Nov 6, 2024

Conversation

irexyc
Copy link
Collaborator

@irexyc irexyc commented Nov 5, 2024

Motivation

support models with head_dim = 64 like InternVL/InternVL2-1B/ and Qwen/Qwen1.5-0.5B-Chat/

@lvhan028 lvhan028 added the enhancement New feature or request label Nov 5, 2024
@@ -241,10 +241,10 @@ void invokeProcessKV_v2(char** blocks,
int block = WARPS * WARP_SIZE;
dim3 grid((max_q_len + CTA_S - 1) / CTA_S, head_num, batch_size);

auto invoke = [&](auto tkv) {
auto invoke = [&](auto tkv, const auto dim) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lzhangzz what does tkv mean?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it means target kv datatype

@lvhan028 lvhan028 requested a review from lzhangzz November 6, 2024 03:05
@lvhan028
Copy link
Collaborator

lvhan028 commented Nov 6, 2024

@zhulinJulia24 may add the following models into tm test set

  • meta-llama/Llama-3.2-1B-Instruct
  • Qwen/Qwen2.5-0.5B-Instruct
  • InternVL/InternVL2-1B

@zhulinJulia24
Copy link
Collaborator

done

@zhulinJulia24 may add the following models into tm test set

  • meta-llama/Llama-3.2-1B-Instruct
  • Qwen/Qwen2.5-0.5B-Instruct
  • InternVL/InternVL2-1B

done

@lvhan028 lvhan028 merged commit e7886b4 into InternLM:main Nov 6, 2024
9 checks passed
AllentDan pushed a commit to AllentDan/lmdeploy that referenced this pull request Nov 13, 2024
* support head_dim 64

* fix unit-test

* fix wrong dispatch

* fix comments

* fix comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants