Skip to content

[Discussion] Should we enable the Ladder weight propagation when the shape is dynamic? #91

Closed
@LeiWang1999

Description

@LeiWang1999

Our project can be considered a dynamic runtime kernel library, which can generate different executables for specific shape and devices on-the-fly. BitBLAS enables Ladder to propagate layout based on the compute expression and target hardware instructions to avoid bank conflict and make sure the global memory load is coalesced as possible. However, our policy and schedule cannot achieve ideal performance when the shape is small, as the parallelism is limited, which lead to an awkward situation where GEMV and GEMM use different instructions (for example, GEMV uses simt while mma be applied on GEMM), the propagated layout for gemm may not optimal for gemv.

Currently, to preserve the optimal performance of GEMV, we have disabled weight propagation when the input M falls within a dynamic range. However, there is a growing trend of increased attention towards the performance of contiguous decoding. And in some projects, like the flute, bitblas has a weak performance when they do benchmarking with a preset dynamic input range.

So its time for us to decide whether should us make a hotfix to open the weight propagation by default, to improve the performance of batched dequantize gemv?

TODO Items:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions