Support Qwen2-MoE models #2723

lzhangzz · 2024-11-07T07:37:23Z

Qwen1.5-MoE-A2.7B-Chat
Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4
Qwen2-57B-A14B-Instruct
Qwen2-57B-A14B-Instruct-GPTQ-Int4

lvhan028 · 2024-11-08T07:00:55Z

lvhan028 · 2024-11-08T07:01:26Z

lmdeploy/turbomind/deploy/config.py

@@ -50,6 +50,8 @@ class ModelConfig:
    expert_num: int = 0
    expert_inter_size: int = 0
    experts_per_token: int = 0
+    moe_shared_gate: int = False


int -> bool

lvhan028 · 2024-11-08T07:08:01Z

src/turbomind/triton_backend/llama/LlamaTritonModel.cc

@@ -301,6 +301,8 @@ LlamaTritonModel<T>::LlamaTritonModel(size_t      tensor_para_size,
    moe_param_.expert_num        = model_reader["expert_num"].as<int>(0);
    moe_param_.experts_per_token = model_reader["experts_per_token"].as<int>(0);
    moe_param_.inter_size        = model_reader["expert_inter_size"].as<int>(0);
+    moe_param_.shared_gate       = model_reader["moe_shared_gate"].as<int>(0);


lvhan028 · 2024-11-08T09:28:37Z

lmdeploy/turbomind/deploy/source_model/qwen.py

+        info['experts_per_token'] = cfg['num_experts_per_tok']
+        info['inter_size'] = cfg['shared_expert_intermediate_size']
+        info['moe_shared_gate'] = True
+        info['moe_norm_topk_prob'] = cfg['norm_topk_prob']


moe_norm_topk_prob is not defined in class ModelConfig

lvhan028 · 2024-11-08T09:43:04Z

src/turbomind/models/llama/moe_ffn_layer.cc

                  CUDA_R_32F,
                  weight.output_dims,
                  CUDA_R_32F,
                  CUBLAS_GEMM_DEFAULT_TENSOR_OP);
 }

 template<class T>
-void MoeFfnLayer<T>::forward(T* inout, int tokens, int layer_id, const MoeFfnWeight<T>& moe)
+void MoeFfnLayer<T>::forward(T* output, const T* input, int tokens, int layer_id, const MoeFfnWeight<T>& moe)


the added output seems unused.

lzhangzz added 5 commits November 7, 2024 04:58

add qwen2-moe

e0b221e

eliminate inter_size_ from ffn layer

cea0524

clean up

b7d050a

fix lint

8fade57

clean up

4c0f902

lvhan028 added the enhancement New feature or request label Nov 7, 2024

lvhan028 requested review from irexyc, lvhan028 and zhulinJulia24 November 8, 2024 03:54

lvhan028 reviewed Nov 8, 2024

View reviewed changes

zhulinJulia24 added 2 commits November 11, 2024 16:44

Update config.yaml

7447b73

Merge branch 'main' into moe2

fd17b0f

lvhan028 approved these changes Nov 12, 2024

View reviewed changes

irexyc approved these changes Nov 13, 2024

View reviewed changes

lvhan028 merged commit d2d4209 into InternLM:main Nov 13, 2024
9 checks passed

anaivebird mentioned this pull request Dec 19, 2024

[Bug] mixtral moe fp16 greedy decode output differ each request #2890

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Qwen2-MoE models #2723

Support Qwen2-MoE models #2723

lzhangzz commented Nov 7, 2024

lvhan028 commented Nov 8, 2024

lvhan028 Nov 8, 2024 •

edited

Loading

lvhan028 Nov 8, 2024

lvhan028 Nov 8, 2024

lvhan028 Nov 8, 2024

Support Qwen2-MoE models #2723

Support Qwen2-MoE models #2723

Conversation

lzhangzz commented Nov 7, 2024

lvhan028 commented Nov 8, 2024

lvhan028 Nov 8, 2024 • edited Loading

Choose a reason for hiding this comment

lvhan028 Nov 8, 2024

Choose a reason for hiding this comment

lvhan028 Nov 8, 2024

Choose a reason for hiding this comment

lvhan028 Nov 8, 2024

Choose a reason for hiding this comment

lvhan028 Nov 8, 2024 •

edited

Loading