-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Kernel] port sgl moe_align_block_size kernels #12574
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
407a559
to
ee02e4c
Compare
csrc/moe/moe_align_sum_kernels.cu
Outdated
@@ -197,6 +197,72 @@ __global__ void moe_align_block_size_global_mem_kernel( | |||
} | |||
} | |||
|
|||
// temporarily adapted from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: what was changed? just the function name? everything else looks the same to me, if thats is the case we should just say taken from
instead of temporarily adapted from
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Thanks.
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
_fwd_grouped_kernel_stage1 0.00% 0.000us 0.00% 0.000us 0.000us 3.362s 56.46% 3.362s 551.093us 6100
void cutlass::device_kernel<vllm::cutlass_3x_gemm_fp... 0.00% 0.000us 0.00% 0.000us 0.000us 740.156ms 12.43% 740.156ms 23.980us 30866
fused_moe_kernel 0.00% 0.000us 0.00% 0.000us 0.000us 582.178ms 9.78% 582.178ms 49.691us 11716
void vllm::moe::moe_align_block_size_kernel<int, uns... 0.00% 0.000us 0.00% 0.000us 0.000us 317.825ms 5.34% 317.825ms 54.255us 5858
_w8a8_block_fp8_matmul 0.00% 0.000us 0.00% 0.000us 0.000us 167.413ms 2.81% 167.413ms 27.173us 6161
vllm::inplace_fused_experts 0.21% 13.429ms 0.36% 23.469ms 404.633us 126.093ms 2.12% 144.540ms 2.492ms 58
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
_fwd_grouped_kernel_stage1 0.00% 0.000us 0.00% 0.000us 0.000us 3.350s 58.79% 3.350s 549.214us 6100
void cutlass::device_kernel<vllm::cutlass_3x_gemm_fp... 0.00% 0.000us 0.00% 0.000us 0.000us 734.721ms 12.89% 734.721ms 23.804us 30866
fused_moe_kernel 0.00% 0.000us 0.00% 0.000us 0.000us 577.712ms 10.14% 577.712ms 49.310us 11716
_w8a8_block_fp8_matmul 0.00% 0.000us 0.00% 0.000us 0.000us 166.231ms 2.92% 166.231ms 26.981us 6161
vllm::inplace_fused_experts 0.22% 13.736ms 0.38% 23.483ms 404.874us 124.716ms 2.19% 139.846ms 2.411ms 58
void vllm::cross_device_reduce_1stage<__nv_bfloat16,... 0.00% 0.000us 0.00% 0.000us 0.000us 106.168ms 1.86% 106.168ms 8.632us 12300
void vllm::moe::sgl_moe_align_block_size_kernel<int>... 0.00% 0.000us 0.00% 0.000us 0.000us 65.403ms 1.15% 65.403ms 11.165us 5858
void at::native::sbtopk::gatherTopK<c10::BFloat16, u... 0.00% 0.000us 0.00% 0.000us 0.000us 62.487ms 1.10% 62.487ms 5.387us 11600 LM Eval: |Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.9507|± |0.0060|
| | |strict-match | 5|exact_match|↑ |0.9484|± |0.0061| |
@@ -18,6 +18,9 @@ | |||
|
|||
logger = init_logger(__name__) | |||
|
|||
enable_moe_align_block_size_triton = bool( | |||
int(os.getenv("ENABLE_MOE_ALIGN_BLOCK_SIZE_TRITON", "0"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we should move this to ENABLE_MOE_ALIGN_BLOCK_SIZE_TRITON
envs.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, left a couple nits
LGTM. Thanks for the work. Can you address Lucas's comments in a follow up? |
sgl_moe_align_block_size is based on: sgl-project/sglang@ded9fcd moe_align_block_size is based on: sgl-project/sglang@ba5112f Signed-off-by: Yang Chen <[email protected]>
Head branch was pushed to by a user without write access
Thanks for the review! Seems CI failed with some timeout error.
BTW, I addressed Lucas's comments in the PR. My new change disabled the auto-merge label. Could you help set it again when you get a chance? Thanks! |
sgl_moe_align_block_size is based on: sgl-project/sglang@ded9fcd moe_align_block_size is based on: sgl-project/sglang@ba5112f Signed-off-by: Yang Chen <[email protected]>
sgl_moe_align_block_size is based on: sgl-project/sglang@ded9fcd moe_align_block_size is based on: sgl-project/sglang@ba5112f Signed-off-by: Yang Chen <[email protected]>
sgl_moe_align_block_size is based on: sgl-project/sglang@ded9fcd moe_align_block_size is based on: sgl-project/sglang@ba5112f Signed-off-by: Yang Chen <[email protected]> Signed-off-by: Felix Marty <[email protected]>
sgl_moe_align_block_size is based on:
sgl-project/sglang@ded9fcd
moe_align_block_size is based on:
sgl-project/sglang@ba5112f