support turbomind head_dim 64 #2715

irexyc · 2024-11-05T07:17:31Z

Motivation

support models with head_dim = 64 like InternVL/InternVL2-1B/ and Qwen/Qwen1.5-0.5B-Chat/

lmdeploy/turbomind/deploy/source_model/internvl.py

lmdeploy/turbomind/supported_models.py

src/turbomind/kernels/attention/decoding.cu

lvhan028 · 2024-11-06T02:30:01Z

src/turbomind/kernels/attention/kv_cache_utils_v2.cu

@@ -241,10 +241,10 @@ void invokeProcessKV_v2(char**       blocks,
    int  block = WARPS * WARP_SIZE;
    dim3 grid((max_q_len + CTA_S - 1) / CTA_S, head_num, batch_size);

-    auto invoke = [&](auto tkv) {
+    auto invoke = [&](auto tkv, const auto dim) {


@lzhangzz what does tkv mean?

I think it means target kv datatype

lmdeploy/turbomind/deploy/source_model/internvl.py

lvhan028 · 2024-11-06T03:36:55Z

@zhulinJulia24 may add the following models into tm test set

meta-llama/Llama-3.2-1B-Instruct
Qwen/Qwen2.5-0.5B-Instruct
InternVL/InternVL2-1B

zhulinJulia24 · 2024-11-06T03:55:50Z

done

@zhulinJulia24 may add the following models into tm test set

meta-llama/Llama-3.2-1B-Instruct

Qwen/Qwen2.5-0.5B-Instruct

InternVL/InternVL2-1B

done

* support head_dim 64 * fix unit-test * fix wrong dispatch * fix comments * fix comments

irexyc added 2 commits November 5, 2024 07:04

support head_dim 64

c901f5b

fix unit-test

47b0774

lvhan028 added the enhancement New feature or request label Nov 5, 2024

irexyc added 2 commits November 5, 2024 14:58

Merge remote-tracking branch 'origin/main' into head64

80e1775

fix wrong dispatch

b7f394a