You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to dequantize the quantized tensor (packed into int32) and perform multiplication to another tensor in fp16. However, I observed a weird error: LLVM ERROR: mma16816 data type not supported when invoking tl.dot. When I further multiply a 1.0 to the dequantized tensor x * scales + zeros * 1.0 and downcast back to tl.float16, then the program can be executed properly.
It seems like this phenomenon only happens in triton==3.0.0. I have tried to downgrade the triton to 2.3.0, and it works well. Does anyone know some of the possible reasons behind this phenomenon or any potential bug in my implementation?
This bug is still in the master branch as of October 17th, it happens with integer-packed weights.
One way to fix it is by replacing the for loop with:
Problem Statement
I am trying to dequantize the quantized tensor (packed into
int32
) and perform multiplication to another tensor infp16
. However, I observed a weird error:LLVM ERROR: mma16816 data type not supported
when invokingtl.dot
. When I further multiply a 1.0 to the dequantized tensorx * scales + zeros * 1.0
and downcast back totl.float16
, then the program can be executed properly.It seems like this phenomenon only happens in
triton==3.0.0
. I have tried to downgrade the triton to2.3.0
, and it works well. Does anyone know some of the possible reasons behind this phenomenon or any potential bug in my implementation?Dependency
Error message
Code to reproduce
For checking the kernel implementation, please go ahead to the
_ab_qx_fwd
function.The text was updated successfully, but these errors were encountered: