[Discussion] Should we enable the Ladder weight propagation when the shape is dynamic?

Our project can be considered a dynamic runtime kernel library, which can generate different executables for specific shape and devices on-the-fly.  BitBLAS enables Ladder to propagate layout based on the compute expression and target hardware instructions to avoid bank conflict and make sure the global memory load is coalesced as possible. However, our policy and schedule cannot achieve ideal performance when the shape is small, as the parallelism is limited, which lead to an awkward situation where GEMV and GEMM use different instructions (for example, GEMV uses simt while mma be applied on GEMM), the propagated layout for gemm may not optimal for gemv. 

Currently, to preserve the optimal performance of GEMV, we have disabled weight propagation when the input M falls within a dynamic range. However, there is a growing trend of increased attention towards the performance of contiguous decoding. And in some projects, like the [flute](https://github.com/HanGuo97/flute), bitblas has a weak performance when they do benchmarking with a preset dynamic input range.

So its time for us to decide whether should us make a hotfix to open the weight propagation by default, to improve the performance of batched dequantize gemv?

TODO Items:

- [x] Implement a benchmarking ci ref to #66 , that we can understand how the hot fix can have impact of a operator sets.

- [x] Survey the efficient implementation of flute and marlin, checkout if we can reproduce it with tvm.
  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Discussion] Should we enable the Ladder weight propagation when the shape is dynamic? #91

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Discussion] Should we enable the Ladder weight propagation when the shape is dynamic? #91

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions