[feature request] builder to expose {GQA, MHA} selection as argument #880

BowenBao · 2024-09-06T17:52:41Z

Currently these are inferred from the combination of other configurations such as device and dtype. It is more flexible for downstream users if this can be selected by choice.

baijumeswani · 2024-09-10T19:23:15Z

What is the advantage of doing it this way? The current process is to take advantage of the fact that the model builder is aware of the attention operator for a specific device and dtype.

Is this for experimentation purposes? If so, maybe we can expose a extra_options flag to override the default attention operator.

BowenBao · 2024-09-20T16:16:04Z

Hi @baijumeswani, the idea is to decouple the tie of device/dtype with built attention op. Consider custom eps that implements attention op with dtype not supported in ort cpu/cuda.

baijumeswani assigned kunal-vaishnavi Sep 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature request] builder to expose {GQA, MHA} selection as argument #880

[feature request] builder to expose {GQA, MHA} selection as argument #880

BowenBao commented Sep 6, 2024

baijumeswani commented Sep 10, 2024

BowenBao commented Sep 20, 2024

[feature request] builder to expose {GQA, MHA} selection as argument #880

[feature request] builder to expose {GQA, MHA} selection as argument #880

Comments

BowenBao commented Sep 6, 2024

baijumeswani commented Sep 10, 2024

BowenBao commented Sep 20, 2024