Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RELAX][PASS] Annotate Custom Scope layout pass for Adreno GPU #17599

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

srkreddy1238
Copy link
Contributor

@srkreddy1238 srkreddy1238 commented Jan 21, 2025

Texture scope annotation is handled by

  • Layout conversion from 4D to 5D with convert_layout pass
  • Legalization of ops with Adreno specific legalization and fall back legalization
  • FuseOps & FuseTIR
  • Now, over the fused TIR annotate the texture scopes by hint_on_device
  • RealizeVDevice will take care of injecting to_device as needed.
  • Also, introduced SpecializeTIRParams to update the fused TIR the prim function buffer var map with new scope information.

Changes in FuseOps and FuseTIR are to forward op attr and op pattern info. This info is used for Texture specific scoping decisions.

Texture scope annotation is handled by
- Layout conversion from 4D to 5D with convert_layout pass
- Legalization of ops with Adreno specific legalization
- FuseOps & FuseTIR
- Now, over the fused TIR annotate the scopes by hint_on_device
- RealizeVDevice will take care of injecting to_device as needed.
- Also, introduced SpecializeTIRParams to update the fused
TIR the prim function buffer var map with new scope information.

Changes in FuseOps and FuseTIR are to forward op attr and op pattern info.
This info is used for Texture specific scoping decisions.
@srkreddy1238 srkreddy1238 force-pushed the annotate_texture_scope branch from fe15d5b to 1549733 Compare January 21, 2025 16:31
@srkreddy1238
Copy link
Contributor Author

@tvm-bot rerun

@tqchen
Copy link
Member

tqchen commented Jan 25, 2025

@Hzfengsy do you mind take a look given it touches FuseOps/TIR

src/script/printer/relax/utils.h Outdated Show resolved Hide resolved
@@ -1092,6 +1095,10 @@ def LegalizeOps(
legalization function is not registered. By default we don't print
warnings.

add_attributes : bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to compose instead? e.g. run attribute attach pass after.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After legalization pass we don't have any trace of operator specific attributes.

include/tvm/relax/transform.h Show resolved Hide resolved
@tqchen
Copy link
Member

tqchen commented Jan 25, 2025

also cc @yongwww for memory scope related changes

Copy link
Member

@Hzfengsy Hzfengsy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some initial comments

python/tvm/relax/transform/optimize_batchnorm.py Outdated Show resolved Hide resolved
src/relax/op/tensor/binary.cc Show resolved Hide resolved
@srkreddy1238
Copy link
Contributor Author

@tvm-bot rerun

Copy link
Member

@Hzfengsy Hzfengsy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tqchen
Copy link
Member

tqchen commented Feb 3, 2025

Thanks @srkreddy1238 for updates. I take a closer look and now understands the motivation behind add_attributes. This is mainly to handle the case of conv2d operators where texture can be supported.

However, attaching op attributes into the call_tir indeed introduce less desirable impact, as the specification of call_tir originally do not have to deal with these attributes, and having them will results in "leak through". This would increase the surface area for developers working with call_tir

I also now understand the demand is to enable the finally fused call_tir function to decide whether texture memory is feasible.

I think it is more cleaner to try a different approach. Instead of relying on legalize pass, let us introduce an adreno specific conv_dispatch which can be used before legalize, to offload these conv operators. We specifically attach the attribute tir.opencl_texture_2d_supported = true to the call node.

Now the remaining question is where the schedule can appear

  • The most clean way is to actually have relax.andreno.conv_dispatch to call the dlight schedule and construct such call_tir, and mark it as already scheduled. The only issue is that in such case followup FuseOps/TIR should treat this as opaque, and do not yet have capabilities to run more fusions. But we should be fine getting the right conv2d op scheduled

To further enable fusion, one can try adopt the following customized legalize sequence
- S0: relax.andreno.conv_dispatch: run conv dispatch and mark it as opaque with tir.opencl_texture_2d_supported = true
- S1: Run legalize and analysis
- S2: Do a pattern match to manually fuse the ewise onto the conv2d (by creating a sub function that calls into conv2d then ewise), this will create a sub function that calls into conv2d and then ewise, which can then be consumed by FuseTIR
- Run FuseOps (this will try to fuse the other ops)
- Run FuseTIR
- Run dlight

include/tvm/ir/global_info.h Outdated Show resolved Hide resolved
src/script/printer/relax/utils.h Outdated Show resolved Hide resolved
src/relax/transform/utils.h Outdated Show resolved Hide resolved
@srkreddy1238
Copy link
Contributor Author

Off late realized, I could have drafted an RFC to describe the approach. Have done now https://discuss.tvm.apache.org/t/rfc-annotate-custom-scope-layout-relax-pass-for-adreno-gpu/18052

@tqchen thanks for the thoughts. Few concerns I have in this approach

  • tir.opencl_texture_2d_supported = true : I assume this flag will be used to realize VDevice in struct_info after FuseTIR. Then, only flag may not be sufficient here we might need scope information for each input. And this information to be consistent while we pass through the fusion ops.
  • Another moderate challenge is in S2, where we need to define and maintain BYOC like pattern table to ensure maximum fusion possibilities.

Pls advice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants