Skip to content

TensorIntrin 'mma_i8i8f16_smooth_a_trans_b_smooth_b' is not registered #160

Closed
@huanpengchu

Description

@huanpengchu

I want to use INT8 matmul , and the code/output is as follows:

Code

import bitblas
import torch
bitblas.set_log_level("Debug")
matmul_config = bitblas.MatmulConfig(
    M=16,  # M dimension
    N=2048,  # N dimension
    K=1024,  # K dimension
    A_dtype="int8",  # activation A dtype
    W_dtype="int8",  # weight W dtype
    accum_dtype="float16",  # accumulation dtype
    out_dtype="float32",  # output dtype
    layout="nt",  # matrix layout, "nt" indicates the layout of A is non-transpose and the layout of W is transpose
    with_bias=False,  # bias
    # configs for weight only quantization
    group_size=None,  # setting for grouped quantization
    with_scaling=False,  # setting for scaling factor
    with_zeros=False,  # setting for zeros
    zeros_mode=None,  # setting for how to calculating zeros
    fast_decoding=False,  # setting for fast decoding
)

matmul = bitblas.Matmul(config=matmul_config)


input_tensor = torch.rand((16, 1024), dtype=torch.float16).cuda()
weight_tensor = torch.randint(0, 7, (2048, 1024), dtype=torch.int8).cuda()
print(weight_tensor.shape)
weight_tensor_int4 = matmul.transform_weight(weight_tensor)

output_tensor = matmul(input_tensor, weight_tensor_int4)

And the error:

2024-08-29 10:43:56 [BitBLAS:INFO]: Auto detected target: nvidia/nvidia-a100 2024-08-29 10:44:20 [BitBLAS:DEBUG]: Apply config {'block': [4, 1, 16], 'thread': [4, 1, 16], 'rstep': []} 2024-08-29 10:44:20 [BitBLAS:DEBUG]: Apply config {'block': [2, 1, 32], 'thread': [2, 1, 32], 'rstep': []} 2024-08-29 10:44:20 [BitBLAS:DEBUG]: Apply config {'block': [4, 1, 32], 'thread': [4, 1, 32], 'rstep': []} 2024-08-29 10:44:20 [BitBLAS:DEBUG]: Apply config {'block': [2, 1, 16], 'thread': [2, 1, 16], 'rstep': []} 2024-08-29 10:44:20 [BitBLAS:DEBUG]: Apply config {'block': [2, 2, 16], 'thread': [2, 2, 16], 'rstep': []} 2024-08-29 10:44:20 [BitBLAS:DEBUG]: Apply config {'block': [1, 1, 16], 'thread': [1, 1, 16], 'rstep': []} 2024-08-29 10:44:20 [BitBLAS:DEBUG]: Apply config {'block': [8, 1, 8], 'thread': [8, 1, 8], 'rstep': []} 2024-08-29 10:44:20 [BitBLAS:DEBUG]: Apply config {'block': [8, 1, 16], 'thread': [8, 1, 16], 'rstep': []} 2024-08-29 10:44:20 [BitBLAS:DEBUG]: Apply config {'block': [4, 2, 16], 'thread': [4, 2, 16], 'rstep': []} 2024-08-29 10:44:20 [BitBLAS:DEBUG]: Apply config {'block': [4, 2, 8], 'thread': [4, 2, 8], 'rstep': []} 2024-08-29 10:44:20 [BitBLAS:DEBUG]: Apply config {'block': [4, 1, 8], 'thread': [4, 1, 8], 'rstep': []} 2024-08-29 10:44:20 [BitBLAS:DEBUG]: Apply config {'block': [1, 2, 32], 'thread': [1, 2, 32], 'rstep': []} 2024-08-29 10:44:20 [BitBLAS:DEBUG]: Apply config {'block': [2, 2, 32], 'thread': [2, 2, 32], 'rstep': []} 2024-08-29 10:44:20 [BitBLAS:DEBUG]: Apply config {'block': [1, 1, 32], 'thread': [1, 1, 32], 'rstep': []} 2024-08-29 10:44:20 [BitBLAS:DEBUG]: Apply config {'block': [2, 1, 8], 'thread': [2, 1, 8], 'rstep': []} 2024-08-29 10:44:20 [BitBLAS:DEBUG]: Apply config {'block': [4, 2, 32], 'thread': [4, 2, 16], 'rstep': []} 2024-08-29 10:44:20 [BitBLAS:DEBUG]: Apply config {'block': [8, 1, 32], 'thread': [8, 1, 16], 'rstep': []} 2024-08-29 10:44:20 [BitBLAS:DEBUG]: Apply config {'block': [4, 1, 4], 'thread': [4, 1, 4], 'rstep': []} 2024-08-29 10:44:20 [BitBLAS:DEBUG]: Apply config {'block': [1, 1, 8], 'thread': [1, 1, 8], 'rstep': []} 2024-08-29 10:44:20 [BitBLAS:DEBUG]: Apply config {'block': [1, 2, 16], 'thread': [1, 2, 16], 'rstep': []} 2024-08-29 10:44:25 [BitBLAS:INFO]: Evaluation with config {'block': [4, 1, 16], 'thread': [4, 1, 16], 'rstep': []} 2024-08-29 10:44:25 [BitBLAS:INFO]: Time cost of this config: 0.006 ms 2024-08-29 10:44:25 [BitBLAS:INFO]: Evaluation with config {'block': [2, 1, 32], 'thread': [2, 1, 32], 'rstep': []} 2024-08-29 10:44:25 [BitBLAS:INFO]: Time cost of this config: 0.005 ms 2024-08-29 10:44:25 [BitBLAS:INFO]: Evaluation with config {'block': [4, 1, 32], 'thread': [4, 1, 32], 'rstep': []} 2024-08-29 10:44:25 [BitBLAS:INFO]: Time cost of this config: 0.005 ms 2024-08-29 10:44:25 [BitBLAS:INFO]: Evaluation with config {'block': [2, 1, 16], 'thread': [2, 1, 16], 'rstep': []} 2024-08-29 10:44:25 [BitBLAS:INFO]: Time cost of this config: 0.005 ms 2024-08-29 10:44:25 [BitBLAS:INFO]: Evaluation with config {'block': [2, 2, 16], 'thread': [2, 2, 16], 'rstep': []} 2024-08-29 10:44:25 [BitBLAS:INFO]: Time cost of this config: 0.006 ms 2024-08-29 10:44:25 [BitBLAS:INFO]: Evaluation with config {'block': [1, 1, 16], 'thread': [1, 1, 16], 'rstep': []} 2024-08-29 10:44:25 [BitBLAS:INFO]: Time cost of this config: 0.005 ms 2024-08-29 10:44:25 [BitBLAS:INFO]: Evaluation with config {'block': [8, 1, 8], 'thread': [8, 1, 8], 'rstep': []} 2024-08-29 10:44:25 [BitBLAS:INFO]: Time cost of this config: 0.005 ms 2024-08-29 10:44:25 [BitBLAS:INFO]: Evaluation with config {'block': [8, 1, 16], 'thread': [8, 1, 16], 'rstep': []} 2024-08-29 10:44:25 [BitBLAS:INFO]: Time cost of this config: 0.005 ms 2024-08-29 10:44:25 [BitBLAS:INFO]: Evaluation with config {'block': [4, 2, 16], 'thread': [4, 2, 16], 'rstep': []} 2024-08-29 10:44:25 [BitBLAS:INFO]: Time cost of this config: 0.005 ms 2024-08-29 10:44:25 [BitBLAS:INFO]: Evaluation with config {'block': [4, 2, 8], 'thread': [4, 2, 8], 'rstep': []} 2024-08-29 10:44:25 [BitBLAS:INFO]: Time cost of this config: 0.005 ms 2024-08-29 10:44:25 [BitBLAS:INFO]: Evaluation with config {'block': [4, 1, 8], 'thread': [4, 1, 8], 'rstep': []} 2024-08-29 10:44:25 [BitBLAS:INFO]: Time cost of this config: 0.005 ms 2024-08-29 10:44:25 [BitBLAS:INFO]: Evaluation with config {'block': [1, 2, 32], 'thread': [1, 2, 32], 'rstep': []} 2024-08-29 10:44:25 [BitBLAS:INFO]: Time cost of this config: 0.005 ms 2024-08-29 10:44:25 [BitBLAS:INFO]: Evaluation with config {'block': [2, 2, 32], 'thread': [2, 2, 32], 'rstep': []} 2024-08-29 10:44:25 [BitBLAS:INFO]: Time cost of this config: 0.005 ms 2024-08-29 10:44:25 [BitBLAS:INFO]: Evaluation with config {'block': [1, 1, 32], 'thread': [1, 1, 32], 'rstep': []} 2024-08-29 10:44:25 [BitBLAS:INFO]: Time cost of this config: 0.006 ms 2024-08-29 10:44:25 [BitBLAS:INFO]: Evaluation with config {'block': [2, 1, 8], 'thread': [2, 1, 8], 'rstep': []} 2024-08-29 10:44:25 [BitBLAS:INFO]: Time cost of this config: 0.005 ms 2024-08-29 10:44:25 [BitBLAS:INFO]: Evaluation with config {'block': [4, 2, 32], 'thread': [4, 2, 16], 'rstep': []} 2024-08-29 10:44:25 [BitBLAS:INFO]: Time cost of this config: 0.005 ms 2024-08-29 10:44:25 [BitBLAS:INFO]: Evaluation with config {'block': [8, 1, 32], 'thread': [8, 1, 16], 'rstep': []} 2024-08-29 10:44:25 [BitBLAS:INFO]: Time cost of this config: 0.005 ms 2024-08-29 10:44:25 [BitBLAS:INFO]: Evaluation with config {'block': [4, 1, 4], 'thread': [4, 1, 4], 'rstep': []} 2024-08-29 10:44:25 [BitBLAS:INFO]: Time cost of this config: 0.007 ms 2024-08-29 10:44:25 [BitBLAS:INFO]: Evaluation with config {'block': [1, 1, 8], 'thread': [1, 1, 8], 'rstep': []} 2024-08-29 10:44:25 [BitBLAS:INFO]: Time cost of this config: 0.006 ms 2024-08-29 10:44:25 [BitBLAS:INFO]: Evaluation with config {'block': [1, 2, 16], 'thread': [1, 2, 16], 'rstep': []} 2024-08-29 10:44:25 [BitBLAS:INFO]: Time cost of this config: 0.005 ms 2024-08-29 10:44:28 [BitBLAS:INFO]: Tile Dict: [16, 256] Shared memory exceeds the static capacity, use dynamic shared memory. 2024-08-29 10:44:28 [BitBLAS:DEBUG]: Apply config {'block': [16, 16], 'warp': [16, 16], 'rstep': [512], 'use_tc': True, 'vectorize': {'A_reindex_reindex': 16, 'B_reindex_reindex': 16}, 'pipeline_stage': 2} 2024-08-29 10:44:28 [BitBLAS:DEBUG]: Apply config {'block': [16, 32], 'warp': [16, 16], 'rstep': [512], 'use_tc': True, 'vectorize': {'A_reindex_reindex': 16, 'B_reindex_reindex': 16}, 'pipeline_stage': 2} 2024-08-29 10:44:28 [BitBLAS:DEBUG]: Apply config {'block': [16, 64], 'warp': [16, 16], 'rstep': [256], 'use_tc': True, 'vectorize': {'A_reindex_reindex': 16, 'B_reindex_reindex': 16}, 'pipeline_stage': 2} 2024-08-29 10:44:28 [BitBLAS:DEBUG]: Apply config {'block': [16, 128], 'warp': [16, 32], 'rstep': [128], 'use_tc': True, 'vectorize': {'A_reindex_reindex': 16, 'B_reindex_reindex': 16}, 'pipeline_stage': 2} 2024-08-29 10:44:28 [BitBLAS:DEBUG]: Apply schedule failed: Traceback (most recent call last): 3: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<void (tvm::tir::Schedule, tvm::runtime::ObjectRef, tvm::runtime::String, bool)>::AssignTypedLambda<tvm::tir::{lambda(tvm::tir::Schedule, tvm::runtime::ObjectRef, tvm::runtime::String, bool)#14}>(tvm::tir::{lambda(tvm::tir::Schedule, tvm::runtime::ObjectRef, tvm::runtime::String, bool)#14}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) 2: tvm::tir::TracedScheduleNode::Tensorize(tvm::tir::LoopRV const&, tvm::runtime::String const&, bool) 1: tvm::tir::ConcreteScheduleNode::Tensorize(tvm::tir::LoopRV const&, tvm::runtime::String const&, bool) 0: tvm::tir::TensorIntrin::Get(tvm::runtime::String, bool) File "/root/BitBLAS/3rdparty/tvm/src/tir/ir/function.cc", line 151 ValueError: TensorIntrin 'mma_i8i8f16_smooth_a_trans_b_smooth_b' is not registered 2024-08-29 10:44:28 [BitBLAS:DEBUG]: Apply config {'block': [16, 256], 'warp': [16, 64], 'rstep': [128], 'use_tc': True, 'vectorize': {'A_reindex_reindex': 16, 'B_reindex_reindex': 16}, 'pipeline_stage': 2} 2024-08-29 10:44:28 [BitBLAS:DEBUG]: Apply schedule failed: Traceback (most recent call last): 3: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<void (tvm::tir::Schedule, tvm::runtime::ObjectRef, tvm::runtime::String, bool)>::AssignTypedLambda<tvm::tir::{lambda(tvm::tir::Schedule, tvm::runtime::ObjectRef, tvm::runtime::String, bool)#14}>(tvm::tir::{lambda(tvm::tir::Schedule, tvm::runtime::ObjectRef, tvm::runtime::String, bool)#14}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) 2: tvm::tir::TracedScheduleNode::Tensorize(tvm::tir::LoopRV const&, tvm::runtime::String const&, bool) 1: tvm::tir::ConcreteScheduleNode::Tensorize(tvm::tir::LoopRV const&, tvm::runtime::String const&, bool) 0: tvm::tir::TensorIntrin::Get(tvm::runtime::String, bool) File "/root/BitBLAS/3rdparty/tvm/src/tir/ir/function.cc", line 151 ValueError: TensorIntrin 'mma_i8i8f16_smooth_a_trans_b_smooth_b' is not registered 2024-08-29 10:44:28 [BitBLAS:DEBUG]: Apply schedule failed: Traceback (most recent call last): 3: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<void (tvm::tir::Schedule, tvm::runtime::ObjectRef, tvm::runtime::String, bool)>::AssignTypedLambda<tvm::tir::{lambda(tvm::tir::Schedule, tvm::runtime::ObjectRef, tvm::runtime::String, bool)#14}>(tvm::tir::{lambda(tvm::tir::Schedule, tvm::runtime::ObjectRef, tvm::runtime::String, bool)#14}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) 2: tvm::tir::TracedScheduleNode::Tensorize(tvm::tir::LoopRV const&, tvm::runtime::String const&, bool) 1: tvm::tir::ConcreteScheduleNode::Tensorize(tvm::tir::LoopRV const&, tvm::runtime::String const&, bool) 0: tvm::tir::TensorIntrin::Get(tvm::runtime::String, bool) File "/root/BitBLAS/3rdparty/tvm/src/tir/ir/function.cc", line 151 ValueError: TensorIntrin 'mma_i8i8f16_smooth_a_trans_b_smooth_b' is not registered 2024-08-29 10:44:28 [BitBLAS:DEBUG]: Apply schedule failed: Traceback (most recent call last): 3: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<void (tvm::tir::Schedule, tvm::runtime::ObjectRef, tvm::runtime::String, bool)>::AssignTypedLambda<tvm::tir::{lambda(tvm::tir::Schedule, tvm::runtime::ObjectRef, tvm::runtime::String, bool)#14}>(tvm::tir::{lambda(tvm::tir::Schedule, tvm::runtime::ObjectRef, tvm::runtime::String, bool)#14}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) 2: tvm::tir::TracedScheduleNode::Tensorize(tvm::tir::LoopRV const&, tvm::runtime::String const&, bool) 1: tvm::tir::ConcreteScheduleNode::Tensorize(tvm::tir::LoopRV const&, tvm::runtime::String const&, bool) 0: tvm::tir::TensorIntrin::Get(tvm::runtime::String, bool) File "/root/BitBLAS/3rdparty/tvm/src/tir/ir/function.cc", line 151 ValueError: TensorIntrin 'mma_i8i8f16_smooth_a_trans_b_smooth_b' is not registered 2024-08-29 10:44:28 [BitBLAS:DEBUG]: Apply schedule failed: Traceback (most recent call last): 3: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<void (tvm::tir::Schedule, tvm::runtime::ObjectRef, tvm::runtime::String, bool)>::AssignTypedLambda<tvm::tir::{lambda(tvm::tir::Schedule, tvm::runtime::ObjectRef, tvm::runtime::String, bool)#14}>(tvm::tir::{lambda(tvm::tir::Schedule, tvm::runtime::ObjectRef, tvm::runtime::String, bool)#14}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) 2: tvm::tir::TracedScheduleNode::Tensorize(tvm::tir::LoopRV const&, tvm::runtime::String const&, bool) 1: tvm::tir::ConcreteScheduleNode::Tensorize(tvm::tir::LoopRV const&, tvm::runtime::String const&, bool) 0: tvm::tir::TensorIntrin::Get(tvm::runtime::String, bool) File "/root/BitBLAS/3rdparty/tvm/src/tir/ir/function.cc", line 151 ValueError: TensorIntrin 'mma_i8i8f16_smooth_a_trans_b_smooth_b' is not registered 2024-08-29 10:44:51 [BitBLAS:DEBUG]: Apply config {'block': [16, 16], 'warp': [16, 16], 'rstep': [512], 'use_tc': True, 'vectorize': {'A_reindex_reindex': 16, 'B_reindex_reindex': 16}, 'pipeline_stage': 2} failed, artifact path is None 2024-08-29 10:44:51 [BitBLAS:DEBUG]: Apply config {'block': [16, 32], 'warp': [16, 16], 'rstep': [512], 'use_tc': True, 'vectorize': {'A_reindex_reindex': 16, 'B_reindex_reindex': 16}, 'pipeline_stage': 2} failed, artifact path is None 2024-08-29 10:44:51 [BitBLAS:DEBUG]: Apply config {'block': [16, 64], 'warp': [16, 16], 'rstep': [256], 'use_tc': True, 'vectorize': {'A_reindex_reindex': 16, 'B_reindex_reindex': 16}, 'pipeline_stage': 2} failed, artifact path is None 2024-08-29 10:44:51 [BitBLAS:DEBUG]: Apply config {'block': [16, 128], 'warp': [16, 32], 'rstep': [128], 'use_tc': True, 'vectorize': {'A_reindex_reindex': 16, 'B_reindex_reindex': 16}, 'pipeline_stage': 2} failed, artifact path is None 2024-08-29 10:44:51 [BitBLAS:DEBUG]: Apply config {'block': [16, 256], 'warp': [16, 64], 'rstep': [128], 'use_tc': True, 'vectorize': {'A_reindex_reindex': 16, 'B_reindex_reindex': 16}, 'pipeline_stage': 2} failed, artifact path is None Traceback (most recent call last): File "/workspace/chuhuanpeng/quant_cogview/pytorch_dit_int8/pytorch_dit_single/temp_bitblas.py", line 74, in <module> matmul = bitblas.Matmul(config=matmul_config) File "/usr/local/lib/python3.10/dist-packages/bitblas/ops/general_matmul/__init__.py", line 292, in __init__ self.dispatch_tir(target, from_database, source_format, enable_tuning) File "/usr/local/lib/python3.10/dist-packages/bitblas/ops/general_matmul/__init__.py", line 348, in dispatch_tir self.hardware_aware_finetune() File "/usr/local/lib/python3.10/dist-packages/bitblas/ops/operator.py", line 206, in hardware_aware_finetune self.optimized_func = self.apply_fast_tuning( File "/usr/local/lib/python3.10/dist-packages/bitblas/ops/operator.py", line 178, in apply_fast_tuning self.pass_context = best.config.pass_context AttributeError: 'NoneType' object has no attribute 'config'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions