Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add QDQ MatMul option to model builder #900

Merged
merged 9 commits into from
Sep 28, 2024

Conversation

PatriceVignola
Copy link
Contributor

No description provided.

src/python/py/models/builder.py Fixed Show fixed Hide fixed
src/python/py/models/builder.py Fixed Show fixed Hide fixed
src/python/py/models/builder.py Fixed Show fixed Hide fixed
src/python/py/models/builder.py Fixed Show fixed Hide fixed
src/python/py/models/builder.py Fixed Show fixed Hide fixed
@yufenglee
Copy link
Member

The operators GroupQueryAttention, MultiHeadAttention, RotaryEmbedding, etc. are contrib ops. The model builder can not build a model without contrib ops now.

@PatriceVignola
Copy link
Contributor Author

The operators GroupQueryAttention, MultiHeadAttention, RotaryEmbedding, etc. are contrib ops. The model builder can not build a model without contrib ops now.

@yufenglee Yes, this would be more of a work in progress kind of thing, starting with this QDQ change, and possibly adding other contrib ops to the list later on. We could fork this in a feature branch until all the work is ready (or replace the --no_contrib_ops switch with something like --use_qdq).

The reason for this PR is that the pytorch exporter route currently is slow, not great, not flexible enough and generates models with patterns that are sometimes too verbose. It's also harder to automatically generate some patterns (e.g. GQA or stacked GQA) with the pytorch exporter. Even though it should be the most obvious path for no-contrib-ops model, I feel like with just a few changes to the model builder that decompose the contrib ops, we can use the same ort-genai pipeline for both contrib ops and non-contrib-ops models.

I'd be happy to be proven wrong, but already with the QDQ path I hit limitations with the pytorch exporter where it wouldn't generate the correct QDQ pattern that we need (e.g. Dequantize + Transpose + MatMul is more natural than Dequantize + MatMul due to how blockwise quantization works).

I'd be happy to explore another path though, this PR is more of a proof of concept of the idea I had in mind.

@yufenglee
Copy link
Member

The operators GroupQueryAttention, MultiHeadAttention, RotaryEmbedding, etc. are contrib ops. The model builder can not build a model without contrib ops now.

@yufenglee Yes, this would be more of a work in progress kind of thing, starting with this QDQ change, and possibly adding other contrib ops to the list later on. We could fork this in a feature branch until all the work is ready (or replace the --no_contrib_ops switch with something like --use_qdq).

The reason for this PR is that the pytorch exporter route currently is slow, not great, not flexible enough and generates models with patterns that are sometimes too verbose. It's also harder to automatically generate some patterns (e.g. GQA or stacked GQA) with the pytorch exporter. Even though it should be the most obvious path for no-contrib-ops model, I feel like with just a few changes to the model builder that decompose the contrib ops, we can use the same ort-genai pipeline for both contrib ops and non-contrib-ops models.

I'd be happy to be proven wrong, but already with the QDQ path I hit limitations with the pytorch exporter where it wouldn't generate the correct QDQ pattern that we need (e.g. Dequantize + Transpose + MatMul is more natural than Dequantize + MatMul due to how blockwise quantization works).

I'd be happy to explore another path though, this PR is more of a proof of concept of the idea I had in mind.

@PatriceVignola, yes, we can do this incrementally. If you want to merge into main branch, i think use_qdq is a better option.

Copy link
Member

@yufenglee yufenglee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@PatriceVignola PatriceVignola merged commit 2f84f3d into main Sep 28, 2024
12 of 13 checks passed
@PatriceVignola PatriceVignola deleted the user/pavignol/add-no-contrib-ops-option branch September 28, 2024 02:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants