[Feature] Enhancing MatmulOps with Splitk Support #48
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request introduces a number of changes across the
python/bitblas
package in order to improve the functionality of the BitBlas library. The changes include updates to theRasterization
andTensorCoreExtraConfig
classes, modifications to thefast_decode_impl
method, and the addition of theMatmulWithSplitK
class.Updates to
Rasterization
andTensorCoreExtraConfig
classes:python/bitblas/base/roller/__init__.py
: Imported newRasterization
classes.python/bitblas/base/roller/hint.py
: Added a new methodtensorcore_legalization
to theTensorCoreExtraConfig
class.Modifications to
fast_decode_impl
method:python/bitblas/gpu/intrin/lop3.py
: Reformatted the arguments in theget_fast_decode_intrin
calls within thefast_decode_impl
method for better readability. [1] [2]Addition of
MatmulWithSplitK
class:python/bitblas/ops/general_matmul_splitk.py
: Added a new file implementing theMatmulWithSplitK
class, which extends the functionality of theMatmul
class with the ability to split the K dimension.Other important changes:
3rdparty/tvm
: Updated the subproject commit.python/bitblas/base/roller/policy/tensorcore.py
: Added a call totensorcore_legalization
in the_score
method.python/bitblas/module/__init__.py
: Changed the default value offast_decoding
fromTrue
toNone
in the__init__
method.python/bitblas/ops/general_matmul.py
: Removed theOPExecutorCPU
class and added a condition to check if fast decoding is supported in the__initialize_fast_decoding
method. [1] [2]