Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIEX] Delay metalizing of multi-slot until iterative scheduling is converged #182

Open
wants to merge 1 commit into
base: aie-public
Choose a base branch
from

Conversation

krishnamtibrewala
Copy link
Collaborator

@krishnamtibrewala krishnamtibrewala commented Sep 6, 2024

This PR allows Multi-Slot Instr. to be used during iterative scheduling of loop.
Before this PR we were materializing Multi-Slot Instr. to selected OpCode/Slot after first iteration of iterative scheduling.
Now we wait until PostRA scheduling is converged to an acceptable schedule and we have decided to commit the schedule and move to next MBB.

  • When is the materialization triggered now? : When we leave a MBB.
  • Does this change depending on the region type? : The changes are agnostic to region type.
  • Could you think of the case where the materialization is not triggered before moving on to another MBB? : None that I can think of

Note : Given the information of what Alternate opcode/desc was selected is stored in Hazard Recognizer for a region.
And by the time we come to the end of MBB ( i.e leaveMBB() ) we do not have access to the instance of those Hazard Recognizer, therefore we need to make the Alternate opcode/desc part of the BlockState

@krishnamtibrewala
Copy link
Collaborator Author

@martien-de-jong , @andcarminati.
Kindly review and provide an early feedback toward the approach

@andcarminati
Copy link
Collaborator

Hi @krishnamtibrewala, nice work! Do you have some results for the PixelShuffle*/PixelUnshuffle* benchmarks? If I remember correctly, we have some suboptimal mov desc assignments (movx should be selected instead of mova to not shift loads ups).

Copy link
Collaborator

@martien-de-jong martien-de-jong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have a test example where we see improvement?

llvm/lib/Target/AIE/AIEInterBlockScheduling.h Outdated Show resolved Hide resolved
llvm/lib/Target/AIE/AIEInterBlockScheduling.h Outdated Show resolved Hide resolved
@krishnamtibrewala
Copy link
Collaborator Author

Do you have some results for the PixelShuffle*/PixelUnshuffle* benchmarks? If I remember correctly, we have some suboptimal mov desc assignments (movx should be selected instead of mova to not shift loads ups).

@andcarminati I tried but I did not see any change, still investigating why.

Could we have a test example where we see improvement?

@martien-de-jong still figuring out.

Based on discussion with @gbossu we were expecting some impact but with current implementation QoR have no change.

$wh10 = VMOV_mv_w $wl0
JNZ $r3, %bb.1
DelayedSchedBarrier
bb.2:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attached is a diff, I am still trying to figure out how things are interacting.
Also will try to come up with a smaller test case.

image

@andcarminati
Copy link
Collaborator

As a general advice, I think we should have a class to manage the description handling. We can encapsulate and use it in HazardRecognizer and InterBlockScheduling.

@@ -263,14 +266,22 @@ class PipelineExtractor : public PipelineScheduleVisitor {
// Prologue and epilogue obtain copies.
MachineInstr *ToBeEmitted =
InLoop ? MI : Loop.TheBlock->getParent()->CloneMachineInstr(MI);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should prefer not using CloneMachineInstr, since this is not added to MBB and other structure like BlockState

Copy link
Collaborator

@martien-de-jong martien-de-jong Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, using CloneMachineInstr is essentially what the post pipeliner needs. When we call this extraction, everything in the schedule has been frozen. We can materialize the pseudos just before calling the extractor.
The original instructions reside in the loop block's region. We materialize them first, then cloning them will see the correct descriptor.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you see a better way to emit an exact copy of an instruction into another basic block, please let me know.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had discussed that offline with Krishnam. The post-pipeliner should only deal with "finalized" instructions when materializing the schedule, i.e. no multi-slot opcodes anywhere. So cloning is fine at this stage, because state like AlternateOpcodes is outdated (and using it should be forbidden with asserts if possible).

@krishnamtibrewala krishnamtibrewala changed the title [AIEX] Re-assign multi-slot instructions during iterative scheduling [AIEX] Delay metalizing of multi-slot until iterative scheduling is converged Oct 14, 2024
@krishnamtibrewala krishnamtibrewala force-pushed the aie-loop-multiOpcode branch 2 times, most recently from 02d0b50 to 8bf356f Compare October 14, 2024 11:24
@krishnamtibrewala krishnamtibrewala force-pushed the aie-loop-multiOpcode branch 2 times, most recently from 7a241b2 to fdab2af Compare October 22, 2024 18:32
@@ -616,34 +632,15 @@ void AIEPostRASchedStrategy::leaveRegion(const SUnit &ExitSU) {
assert(BS.getCurrentRegion().Bundles.empty());
BS.addBundles(TopBundles);
BS.addBundles(BotBundles);
if (!ReAssignMultiSlotInstr)
materializeMultiOpcodeInstrs();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason to "move" the lines down here? What if we do not enter if (IsBottomRegion) { ... }?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following code block is outside the if (IsBottomRegion) { ... }

  if (!ReAssignMultiSlotInstr)
    materializeMultiOpcodeInstrs();

@gbossu
Copy link
Collaborator

gbossu commented Nov 4, 2024

Could you summarize what this PR does? Maybe in the PR description. I'm particularly interested in:

  • When is the materialization triggered now?
  • Does this change depending on the region type?
  • Could you think of the case where the materialization is not triggered before moving on to another MBB?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants