-
Notifications
You must be signed in to change notification settings - Fork 760
Refactor udt intrinsic arg copy to before SROA, flatten RayDesc #7440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Intrinsics that take UDT arguments need copy-in/copy-out. Other aggregate args are flattened for intrinsic calls. Previously, these operations were intermingled, driven by SROA on alloca/GV values. There were RayDesc arguments that weren't treated consistently, leading to problems. They should be flattened into the intrinsic arguments, but TraceRay calls didn't do this. Doing this for TraceRay would cause an issue where the subsequent Payload UDT arg could be in a different position depending on whether it was processed before or after flattening the RayDesc. This change is in preparation for flattening RayDesc args. It separates the UDT copy-in/copy-out into a separate operation before SROA. When doing the copy-in/copy-out separately, we don't expand the memcpy operations until they are handled separately during SROA. This causes a change in some IR shape, which had to be adjusted for in some tests. There are a couple remaining PIX pass tests that still need adjustment. These will likely need further changes when flattening RayDesc, so maybe they should be revisited after that. TODO: - Fix the remaining tests - Flatten the RayDesc parameters - Adjust tests for accordingly
✅ With the latest revision this PR passed the C/C++ code formatter. |
This renames: - copyIntrinsicUDTArgs -> copyIntrinsicAggArgs - RewriteCallArg -> memcpyAggCallArg. Updated to copy the RayDesc aggregate arguments as well. This will fix the issue with a RayDesc argument provided directly from a cbuffer, since the incoming argument pointer will no longer be skipped by SROA. Expected IR for tests will need to be updated after flattening RayDesc, so holding off on updating tests until then.
- Enable RayDesc flattening in SROA - Update lowering code accordingly - Separated MakeNop and MakeMiss lowering into two functions - Updated tests TODO: Update PIX pass tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks sensible. Sorry if I was too detailed for a draft in some parts.
The PIX test or portion of test that relies on the old compiler behavior of alloca for RayDesc. Since these tests were made specifically to verfify this old pattern, it was ok to remove them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have anything against this change, just some confirmations and very mild suggestions. Since you need a second review anyway, perhaps getting Jeff to comment on the pix changes would be expedient. I don't expect him to review the other parts though, so you might choose to get someone else to give those a glance as well.
tools/clang/test/DXC/Passes/ScalarReplHLSL/hitobject_traceinvoke_scalarrepl.ll
Outdated
Show resolved
Hide resolved
tools/clang/test/DXC/Passes/ScalarReplHLSL/hitobject_traceinvoke_scalarrepl.ll
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving again again again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Intrinsics that take UDT arguments need copy-in/copy-out. Other aggregate args are flattened for intrinsic calls. Previously, these operations were intermingled, driven by SROA on alloca/GV values.
There were RayDesc arguments that weren't treated consistently, and weren't copied in when necessary, leading to problems. They should be flattened into the intrinsic arguments, but TraceRay calls didn't do this.
This change:
Fixes #7434.