TensorIntrin 'mma_i8i8f16_smooth_a_trans_b_smooth_b' is not registered

I  want to use INT8 matmul , and the code/output is as follows:
### Code
```
import bitblas
import torch
bitblas.set_log_level("Debug")
matmul_config = bitblas.MatmulConfig(
    M=16,  # M dimension
    N=2048,  # N dimension
    K=1024,  # K dimension
    A_dtype="int8",  # activation A dtype
    W_dtype="int8",  # weight W dtype
    accum_dtype="float16",  # accumulation dtype
    out_dtype="float32",  # output dtype
    layout="nt",  # matrix layout, "nt" indicates the layout of A is non-transpose and the layout of W is transpose
    with_bias=False,  # bias
    # configs for weight only quantization
    group_size=None,  # setting for grouped quantization
    with_scaling=False,  # setting for scaling factor
    with_zeros=False,  # setting for zeros
    zeros_mode=None,  # setting for how to calculating zeros
    fast_decoding=False,  # setting for fast decoding
)

matmul = bitblas.Matmul(config=matmul_config)


input_tensor = torch.rand((16, 1024), dtype=torch.float16).cuda()
weight_tensor = torch.randint(0, 7, (2048, 1024), dtype=torch.int8).cuda()
print(weight_tensor.shape)
weight_tensor_int4 = matmul.transform_weight(weight_tensor)

output_tensor = matmul(input_tensor, weight_tensor_int4)
```


### And the error:
`2024-08-29 10:43:56 [BitBLAS:INFO]: Auto detected target: nvidia/nvidia-a100
2024-08-29 10:44:20 [BitBLAS:DEBUG]: Apply config {'block': [4, 1, 16], 'thread': [4, 1, 16], 'rstep': []}
2024-08-29 10:44:20 [BitBLAS:DEBUG]: Apply config {'block': [2, 1, 32], 'thread': [2, 1, 32], 'rstep': []}
2024-08-29 10:44:20 [BitBLAS:DEBUG]: Apply config {'block': [4, 1, 32], 'thread': [4, 1, 32], 'rstep': []}
2024-08-29 10:44:20 [BitBLAS:DEBUG]: Apply config {'block': [2, 1, 16], 'thread': [2, 1, 16], 'rstep': []}
2024-08-29 10:44:20 [BitBLAS:DEBUG]: Apply config {'block': [2, 2, 16], 'thread': [2, 2, 16], 'rstep': []}
2024-08-29 10:44:20 [BitBLAS:DEBUG]: Apply config {'block': [1, 1, 16], 'thread': [1, 1, 16], 'rstep': []}
2024-08-29 10:44:20 [BitBLAS:DEBUG]: Apply config {'block': [8, 1, 8], 'thread': [8, 1, 8], 'rstep': []}
2024-08-29 10:44:20 [BitBLAS:DEBUG]: Apply config {'block': [8, 1, 16], 'thread': [8, 1, 16], 'rstep': []}
2024-08-29 10:44:20 [BitBLAS:DEBUG]: Apply config {'block': [4, 2, 16], 'thread': [4, 2, 16], 'rstep': []}
2024-08-29 10:44:20 [BitBLAS:DEBUG]: Apply config {'block': [4, 2, 8], 'thread': [4, 2, 8], 'rstep': []}
2024-08-29 10:44:20 [BitBLAS:DEBUG]: Apply config {'block': [4, 1, 8], 'thread': [4, 1, 8], 'rstep': []}
2024-08-29 10:44:20 [BitBLAS:DEBUG]: Apply config {'block': [1, 2, 32], 'thread': [1, 2, 32], 'rstep': []}
2024-08-29 10:44:20 [BitBLAS:DEBUG]: Apply config {'block': [2, 2, 32], 'thread': [2, 2, 32], 'rstep': []}
2024-08-29 10:44:20 [BitBLAS:DEBUG]: Apply config {'block': [1, 1, 32], 'thread': [1, 1, 32], 'rstep': []}
2024-08-29 10:44:20 [BitBLAS:DEBUG]: Apply config {'block': [2, 1, 8], 'thread': [2, 1, 8], 'rstep': []}
2024-08-29 10:44:20 [BitBLAS:DEBUG]: Apply config {'block': [4, 2, 32], 'thread': [4, 2, 16], 'rstep': []}
2024-08-29 10:44:20 [BitBLAS:DEBUG]: Apply config {'block': [8, 1, 32], 'thread': [8, 1, 16], 'rstep': []}
2024-08-29 10:44:20 [BitBLAS:DEBUG]: Apply config {'block': [4, 1, 4], 'thread': [4, 1, 4], 'rstep': []}
2024-08-29 10:44:20 [BitBLAS:DEBUG]: Apply config {'block': [1, 1, 8], 'thread': [1, 1, 8], 'rstep': []}
2024-08-29 10:44:20 [BitBLAS:DEBUG]: Apply config {'block': [1, 2, 16], 'thread': [1, 2, 16], 'rstep': []}
2024-08-29 10:44:25 [BitBLAS:INFO]: Evaluation with config {'block': [4, 1, 16], 'thread': [4, 1, 16], 'rstep': []}
2024-08-29 10:44:25 [BitBLAS:INFO]: Time cost of this config: 0.006 ms
2024-08-29 10:44:25 [BitBLAS:INFO]: Evaluation with config {'block': [2, 1, 32], 'thread': [2, 1, 32], 'rstep': []}
2024-08-29 10:44:25 [BitBLAS:INFO]: Time cost of this config: 0.005 ms
2024-08-29 10:44:25 [BitBLAS:INFO]: Evaluation with config {'block': [4, 1, 32], 'thread': [4, 1, 32], 'rstep': []}
2024-08-29 10:44:25 [BitBLAS:INFO]: Time cost of this config: 0.005 ms
2024-08-29 10:44:25 [BitBLAS:INFO]: Evaluation with config {'block': [2, 1, 16], 'thread': [2, 1, 16], 'rstep': []}
2024-08-29 10:44:25 [BitBLAS:INFO]: Time cost of this config: 0.005 ms
2024-08-29 10:44:25 [BitBLAS:INFO]: Evaluation with config {'block': [2, 2, 16], 'thread': [2, 2, 16], 'rstep': []}
2024-08-29 10:44:25 [BitBLAS:INFO]: Time cost of this config: 0.006 ms
2024-08-29 10:44:25 [BitBLAS:INFO]: Evaluation with config {'block': [1, 1, 16], 'thread': [1, 1, 16], 'rstep': []}
2024-08-29 10:44:25 [BitBLAS:INFO]: Time cost of this config: 0.005 ms
2024-08-29 10:44:25 [BitBLAS:INFO]: Evaluation with config {'block': [8, 1, 8], 'thread': [8, 1, 8], 'rstep': []}
2024-08-29 10:44:25 [BitBLAS:INFO]: Time cost of this config: 0.005 ms
2024-08-29 10:44:25 [BitBLAS:INFO]: Evaluation with config {'block': [8, 1, 16], 'thread': [8, 1, 16], 'rstep': []}
2024-08-29 10:44:25 [BitBLAS:INFO]: Time cost of this config: 0.005 ms
2024-08-29 10:44:25 [BitBLAS:INFO]: Evaluation with config {'block': [4, 2, 16], 'thread': [4, 2, 16], 'rstep': []}
2024-08-29 10:44:25 [BitBLAS:INFO]: Time cost of this config: 0.005 ms
2024-08-29 10:44:25 [BitBLAS:INFO]: Evaluation with config {'block': [4, 2, 8], 'thread': [4, 2, 8], 'rstep': []}
2024-08-29 10:44:25 [BitBLAS:INFO]: Time cost of this config: 0.005 ms
2024-08-29 10:44:25 [BitBLAS:INFO]: Evaluation with config {'block': [4, 1, 8], 'thread': [4, 1, 8], 'rstep': []}
2024-08-29 10:44:25 [BitBLAS:INFO]: Time cost of this config: 0.005 ms
2024-08-29 10:44:25 [BitBLAS:INFO]: Evaluation with config {'block': [1, 2, 32], 'thread': [1, 2, 32], 'rstep': []}
2024-08-29 10:44:25 [BitBLAS:INFO]: Time cost of this config: 0.005 ms
2024-08-29 10:44:25 [BitBLAS:INFO]: Evaluation with config {'block': [2, 2, 32], 'thread': [2, 2, 32], 'rstep': []}
2024-08-29 10:44:25 [BitBLAS:INFO]: Time cost of this config: 0.005 ms
2024-08-29 10:44:25 [BitBLAS:INFO]: Evaluation with config {'block': [1, 1, 32], 'thread': [1, 1, 32], 'rstep': []}
2024-08-29 10:44:25 [BitBLAS:INFO]: Time cost of this config: 0.006 ms
2024-08-29 10:44:25 [BitBLAS:INFO]: Evaluation with config {'block': [2, 1, 8], 'thread': [2, 1, 8], 'rstep': []}
2024-08-29 10:44:25 [BitBLAS:INFO]: Time cost of this config: 0.005 ms
2024-08-29 10:44:25 [BitBLAS:INFO]: Evaluation with config {'block': [4, 2, 32], 'thread': [4, 2, 16], 'rstep': []}
2024-08-29 10:44:25 [BitBLAS:INFO]: Time cost of this config: 0.005 ms
2024-08-29 10:44:25 [BitBLAS:INFO]: Evaluation with config {'block': [8, 1, 32], 'thread': [8, 1, 16], 'rstep': []}
2024-08-29 10:44:25 [BitBLAS:INFO]: Time cost of this config: 0.005 ms
2024-08-29 10:44:25 [BitBLAS:INFO]: Evaluation with config {'block': [4, 1, 4], 'thread': [4, 1, 4], 'rstep': []}
2024-08-29 10:44:25 [BitBLAS:INFO]: Time cost of this config: 0.007 ms
2024-08-29 10:44:25 [BitBLAS:INFO]: Evaluation with config {'block': [1, 1, 8], 'thread': [1, 1, 8], 'rstep': []}
2024-08-29 10:44:25 [BitBLAS:INFO]: Time cost of this config: 0.006 ms
2024-08-29 10:44:25 [BitBLAS:INFO]: Evaluation with config {'block': [1, 2, 16], 'thread': [1, 2, 16], 'rstep': []}
2024-08-29 10:44:25 [BitBLAS:INFO]: Time cost of this config: 0.005 ms
2024-08-29 10:44:28 [BitBLAS:INFO]: Tile Dict: [16, 256] Shared memory exceeds the static capacity, use dynamic shared memory.
2024-08-29 10:44:28 [BitBLAS:DEBUG]: Apply config {'block': [16, 16], 'warp': [16, 16], 'rstep': [512], 'use_tc': True, 'vectorize': {'A_reindex_reindex': 16, 'B_reindex_reindex': 16}, 'pipeline_stage': 2}
2024-08-29 10:44:28 [BitBLAS:DEBUG]: Apply config {'block': [16, 32], 'warp': [16, 16], 'rstep': [512], 'use_tc': True, 'vectorize': {'A_reindex_reindex': 16, 'B_reindex_reindex': 16}, 'pipeline_stage': 2}
2024-08-29 10:44:28 [BitBLAS:DEBUG]: Apply config {'block': [16, 64], 'warp': [16, 16], 'rstep': [256], 'use_tc': True, 'vectorize': {'A_reindex_reindex': 16, 'B_reindex_reindex': 16}, 'pipeline_stage': 2}
2024-08-29 10:44:28 [BitBLAS:DEBUG]: Apply config {'block': [16, 128], 'warp': [16, 32], 'rstep': [128], 'use_tc': True, 'vectorize': {'A_reindex_reindex': 16, 'B_reindex_reindex': 16}, 'pipeline_stage': 2}
2024-08-29 10:44:28 [BitBLAS:DEBUG]: Apply schedule failed: Traceback (most recent call last):
  3: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<void (tvm::tir::Schedule, tvm::runtime::ObjectRef, tvm::runtime::String, bool)>::AssignTypedLambda<tvm::tir::{lambda(tvm::tir::Schedule, tvm::runtime::ObjectRef, tvm::runtime::String, bool)#14}>(tvm::tir::{lambda(tvm::tir::Schedule, tvm::runtime::ObjectRef, tvm::runtime::String, bool)#14}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  2: tvm::tir::TracedScheduleNode::Tensorize(tvm::tir::LoopRV const&, tvm::runtime::String const&, bool)
  1: tvm::tir::ConcreteScheduleNode::Tensorize(tvm::tir::LoopRV const&, tvm::runtime::String const&, bool)
  0: tvm::tir::TensorIntrin::Get(tvm::runtime::String, bool)
  File "/root/BitBLAS/3rdparty/tvm/src/tir/ir/function.cc", line 151
ValueError: TensorIntrin 'mma_i8i8f16_smooth_a_trans_b_smooth_b' is not registered
2024-08-29 10:44:28 [BitBLAS:DEBUG]: Apply config {'block': [16, 256], 'warp': [16, 64], 'rstep': [128], 'use_tc': True, 'vectorize': {'A_reindex_reindex': 16, 'B_reindex_reindex': 16}, 'pipeline_stage': 2}
2024-08-29 10:44:28 [BitBLAS:DEBUG]: Apply schedule failed: Traceback (most recent call last):
  3: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<void (tvm::tir::Schedule, tvm::runtime::ObjectRef, tvm::runtime::String, bool)>::AssignTypedLambda<tvm::tir::{lambda(tvm::tir::Schedule, tvm::runtime::ObjectRef, tvm::runtime::String, bool)#14}>(tvm::tir::{lambda(tvm::tir::Schedule, tvm::runtime::ObjectRef, tvm::runtime::String, bool)#14}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  2: tvm::tir::TracedScheduleNode::Tensorize(tvm::tir::LoopRV const&, tvm::runtime::String const&, bool)
  1: tvm::tir::ConcreteScheduleNode::Tensorize(tvm::tir::LoopRV const&, tvm::runtime::String const&, bool)
  0: tvm::tir::TensorIntrin::Get(tvm::runtime::String, bool)
  File "/root/BitBLAS/3rdparty/tvm/src/tir/ir/function.cc", line 151
ValueError: TensorIntrin 'mma_i8i8f16_smooth_a_trans_b_smooth_b' is not registered
2024-08-29 10:44:28 [BitBLAS:DEBUG]: Apply schedule failed: Traceback (most recent call last):
  3: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<void (tvm::tir::Schedule, tvm::runtime::ObjectRef, tvm::runtime::String, bool)>::AssignTypedLambda<tvm::tir::{lambda(tvm::tir::Schedule, tvm::runtime::ObjectRef, tvm::runtime::String, bool)#14}>(tvm::tir::{lambda(tvm::tir::Schedule, tvm::runtime::ObjectRef, tvm::runtime::String, bool)#14}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  2: tvm::tir::TracedScheduleNode::Tensorize(tvm::tir::LoopRV const&, tvm::runtime::String const&, bool)
  1: tvm::tir::ConcreteScheduleNode::Tensorize(tvm::tir::LoopRV const&, tvm::runtime::String const&, bool)
  0: tvm::tir::TensorIntrin::Get(tvm::runtime::String, bool)
  File "/root/BitBLAS/3rdparty/tvm/src/tir/ir/function.cc", line 151
ValueError: TensorIntrin 'mma_i8i8f16_smooth_a_trans_b_smooth_b' is not registered
2024-08-29 10:44:28 [BitBLAS:DEBUG]: Apply schedule failed: Traceback (most recent call last):
  3: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<void (tvm::tir::Schedule, tvm::runtime::ObjectRef, tvm::runtime::String, bool)>::AssignTypedLambda<tvm::tir::{lambda(tvm::tir::Schedule, tvm::runtime::ObjectRef, tvm::runtime::String, bool)#14}>(tvm::tir::{lambda(tvm::tir::Schedule, tvm::runtime::ObjectRef, tvm::runtime::String, bool)#14}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  2: tvm::tir::TracedScheduleNode::Tensorize(tvm::tir::LoopRV const&, tvm::runtime::String const&, bool)
  1: tvm::tir::ConcreteScheduleNode::Tensorize(tvm::tir::LoopRV const&, tvm::runtime::String const&, bool)
  0: tvm::tir::TensorIntrin::Get(tvm::runtime::String, bool)
  File "/root/BitBLAS/3rdparty/tvm/src/tir/ir/function.cc", line 151
ValueError: TensorIntrin 'mma_i8i8f16_smooth_a_trans_b_smooth_b' is not registered
2024-08-29 10:44:28 [BitBLAS:DEBUG]: Apply schedule failed: Traceback (most recent call last):
  3: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<void (tvm::tir::Schedule, tvm::runtime::ObjectRef, tvm::runtime::String, bool)>::AssignTypedLambda<tvm::tir::{lambda(tvm::tir::Schedule, tvm::runtime::ObjectRef, tvm::runtime::String, bool)#14}>(tvm::tir::{lambda(tvm::tir::Schedule, tvm::runtime::ObjectRef, tvm::runtime::String, bool)#14}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  2: tvm::tir::TracedScheduleNode::Tensorize(tvm::tir::LoopRV const&, tvm::runtime::String const&, bool)
  1: tvm::tir::ConcreteScheduleNode::Tensorize(tvm::tir::LoopRV const&, tvm::runtime::String const&, bool)
  0: tvm::tir::TensorIntrin::Get(tvm::runtime::String, bool)
  File "/root/BitBLAS/3rdparty/tvm/src/tir/ir/function.cc", line 151
ValueError: TensorIntrin 'mma_i8i8f16_smooth_a_trans_b_smooth_b' is not registered
2024-08-29 10:44:51 [BitBLAS:DEBUG]: Apply config {'block': [16, 16], 'warp': [16, 16], 'rstep': [512], 'use_tc': True, 'vectorize': {'A_reindex_reindex': 16, 'B_reindex_reindex': 16}, 'pipeline_stage': 2} failed, artifact path is None
2024-08-29 10:44:51 [BitBLAS:DEBUG]: Apply config {'block': [16, 32], 'warp': [16, 16], 'rstep': [512], 'use_tc': True, 'vectorize': {'A_reindex_reindex': 16, 'B_reindex_reindex': 16}, 'pipeline_stage': 2} failed, artifact path is None
2024-08-29 10:44:51 [BitBLAS:DEBUG]: Apply config {'block': [16, 64], 'warp': [16, 16], 'rstep': [256], 'use_tc': True, 'vectorize': {'A_reindex_reindex': 16, 'B_reindex_reindex': 16}, 'pipeline_stage': 2} failed, artifact path is None
2024-08-29 10:44:51 [BitBLAS:DEBUG]: Apply config {'block': [16, 128], 'warp': [16, 32], 'rstep': [128], 'use_tc': True, 'vectorize': {'A_reindex_reindex': 16, 'B_reindex_reindex': 16}, 'pipeline_stage': 2} failed, artifact path is None
2024-08-29 10:44:51 [BitBLAS:DEBUG]: Apply config {'block': [16, 256], 'warp': [16, 64], 'rstep': [128], 'use_tc': True, 'vectorize': {'A_reindex_reindex': 16, 'B_reindex_reindex': 16}, 'pipeline_stage': 2} failed, artifact path is None
Traceback (most recent call last):
  File "/workspace/chuhuanpeng/quant_cogview/pytorch_dit_int8/pytorch_dit_single/temp_bitblas.py", line 74, in <module>
    matmul = bitblas.Matmul(config=matmul_config)
  File "/usr/local/lib/python3.10/dist-packages/bitblas/ops/general_matmul/__init__.py", line 292, in __init__
    self.dispatch_tir(target, from_database, source_format, enable_tuning)
  File "/usr/local/lib/python3.10/dist-packages/bitblas/ops/general_matmul/__init__.py", line 348, in dispatch_tir
    self.hardware_aware_finetune()
  File "/usr/local/lib/python3.10/dist-packages/bitblas/ops/operator.py", line 206, in hardware_aware_finetune
    self.optimized_func = self.apply_fast_tuning(
  File "/usr/local/lib/python3.10/dist-packages/bitblas/ops/operator.py", line 178, in apply_fast_tuning
    self.pass_context = best.config.pass_context
AttributeError: 'NoneType' object has no attribute 'config'`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TensorIntrin 'mma_i8i8f16_smooth_a_trans_b_smooth_b' is not registered #160

Code

And the error:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

TensorIntrin 'mma_i8i8f16_smooth_a_trans_b_smooth_b' is not registered #160

Description

Code

And the error:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions