With chunked prefil, for large prompts, the sampler can encounter a z…

…ero-sized tensor, on which skinny gemm fails
ROCm · Sep 23, 2024 · ff4e478 · ff4e478
1 parent 57ea101
commit ff4e478
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/vllm/model_executor/layers/tuned_gemm.py b/vllm/model_executor/layers/tuned_gemm.py
@@ -62,7 +62,7 @@ def apply_skinny(self, m, n, k, inp_view, weights):
             return None
         if inp_view.dtype != torch.float16 or k % 8 != 0:
             return None
-        if m > 8 and n <= 4:
+        if m > 8 and 0 < n <= 4:
             out = torch.empty(inp_view.shape[0],
                               weights.shape[0],
                               dtype=inp_view.dtype,