Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] builder to expose {GQA, MHA} selection as argument #880

Open
BowenBao opened this issue Sep 6, 2024 · 2 comments
Open
Assignees

Comments

@BowenBao
Copy link
Contributor

BowenBao commented Sep 6, 2024

Currently these are inferred from the combination of other configurations such as device and dtype. It is more flexible for downstream users if this can be selected by choice.

@baijumeswani
Copy link
Contributor

What is the advantage of doing it this way? The current process is to take advantage of the fact that the model builder is aware of the attention operator for a specific device and dtype.

Is this for experimentation purposes? If so, maybe we can expose a extra_options flag to override the default attention operator.

@BowenBao
Copy link
Contributor Author

Hi @baijumeswani, the idea is to decouple the tie of device/dtype with built attention op. Consider custom eps that implements attention op with dtype not supported in ort cpu/cuda.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants