[Feature] Enhancing MatmulOps with Splitk Support #48

LeiWang1999 · 2024-06-05T07:21:49Z

This pull request introduces a number of changes across the python/bitblas package in order to improve the functionality of the BitBlas library. The changes include updates to the Rasterization and TensorCoreExtraConfig classes, modifications to the fast_decode_impl method, and the addition of the MatmulWithSplitK class.

Updates to Rasterization and TensorCoreExtraConfig classes:

python/bitblas/base/roller/__init__.py: Imported new Rasterization classes.
python/bitblas/base/roller/hint.py: Added a new method tensorcore_legalization to the TensorCoreExtraConfig class.

Modifications to fast_decode_impl method:

python/bitblas/gpu/intrin/lop3.py: Reformatted the arguments in the get_fast_decode_intrin calls within the fast_decode_impl method for better readability. [1] [2]

Addition of MatmulWithSplitK class:

python/bitblas/ops/general_matmul_splitk.py: Added a new file implementing the MatmulWithSplitK class, which extends the functionality of the Matmul class with the ability to split the K dimension.

Other important changes:

3rdparty/tvm: Updated the subproject commit.
python/bitblas/base/roller/policy/tensorcore.py: Added a call to tensorcore_legalization in the _score method.
python/bitblas/module/__init__.py: Changed the default value of fast_decoding from True to None in the __init__ method.
python/bitblas/ops/general_matmul.py: Removed the OPExecutorCPU class and added a condition to check if fast decoding is supported in the __initialize_fast_decoding method. [1] [2]

…splitk

LeiWang199 added 13 commits May 21, 2024 11:51

improve e4m3 decoding.

75d2f3d

Merge branch 'main' of https://github.com/microsoft/BitBLAS into main

dd744d0

append fp16xint1

00bfa31

Update submodule commit reference

8cd8b10

chore: Update shared memory scope for float32 output dtype

9122ff7

BUGFIX: UINT8/INT8 Decoding

b508acc

feat: Add rasterization options for roller module

58d55b7

Refactor tensorcore_legalization method to optimize tensor core usage

e7547ce

feat: Add function to collect variables from expression, improve for …

678a2e1

…splitk

chore: Update typing import in __init__.py

3088b35

chore: Refactor CPU execution of operators

5d206b3

Refactor matmul implementation for splitk layout

e06ce10

Refactor matmul implementation for splitk layout

d67cc6d

LeiWang1999 merged commit 99a744e into microsoft:main Jun 5, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Enhancing MatmulOps with Splitk Support #48

[Feature] Enhancing MatmulOps with Splitk Support #48

LeiWang1999 commented Jun 5, 2024

[Feature] Enhancing MatmulOps with Splitk Support #48

[Feature] Enhancing MatmulOps with Splitk Support #48

Conversation

LeiWang1999 commented Jun 5, 2024