-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add QDQ MatMul option to model builder #900
Conversation
The operators GroupQueryAttention, MultiHeadAttention, RotaryEmbedding, etc. are contrib ops. The model builder can not build a model without contrib ops now. |
@yufenglee Yes, this would be more of a work in progress kind of thing, starting with this QDQ change, and possibly adding other contrib ops to the list later on. We could fork this in a feature branch until all the work is ready (or replace the The reason for this PR is that the pytorch exporter route currently is slow, not great, not flexible enough and generates models with patterns that are sometimes too verbose. It's also harder to automatically generate some patterns (e.g. GQA or stacked GQA) with the pytorch exporter. Even though it should be the most obvious path for no-contrib-ops model, I feel like with just a few changes to the model builder that decompose the contrib ops, we can use the same ort-genai pipeline for both contrib ops and non-contrib-ops models. I'd be happy to be proven wrong, but already with the QDQ path I hit limitations with the pytorch exporter where it wouldn't generate the correct QDQ pattern that we need (e.g. Dequantize + Transpose + MatMul is more natural than Dequantize + MatMul due to how blockwise quantization works). I'd be happy to explore another path though, this PR is more of a proof of concept of the idea I had in mind. |
@PatriceVignola, yes, we can do this incrementally. If you want to merge into main branch, i think use_qdq is a better option. |
…into user/pavignol/add-no-contrib-ops-option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No description provided.