Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Register re-allocation tweaks #340

Merged
merged 7 commits into from
Feb 13, 2025
Merged

Register re-allocation tweaks #340

merged 7 commits into from
Feb 13, 2025

Conversation

gbossu
Copy link
Collaborator

@gbossu gbossu commented Feb 6, 2025

This is a follow-up of the work from @F-Stuckmann and @martien-de-jong. It uses the least-recently-used approach for re-allocating registers. This limits WAW dependencies and gives more freedom to the post-pipeliner.

On top of this, there are two extra commits to allow:

  • Re-allocation of GPRs
  • Aggressive re-allocation: i.e. by de-allocation live-through registers.

QoR:

  • AIE2: 0.3% cycle count improvements on aie-public. There are a couple of regressions, but we get them back if forcing the post-pipeliner. This PR is also necessary to get the optimal schedule for GEMM_bf16 and Conv2D_int8
  • AIE2p: 4.2% cycle count improvement, because the register renamer was completely disabled.

@gbossu gbossu force-pushed the gaetan.register.realloc branch from c4d21bd to 47b6897 Compare February 7, 2025 10:59
};

// Least-Recently-Used list. Append new uses at the end, search from the start
std::list<MCPhysReg> LRURegisters;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RoundRobin?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the declaration can be closer to its first use.

@@ -380,8 +396,7 @@ AIEWawRegRewriter::getReplacementPhysReg(const Register VReg,
if (IK == LiveRegMatrix::IK_Free) {
// Move it to the end of the list. We return, so don't have to
// care about invalidation
LRURegisters.erase(It);
LRURegisters.emplace_back(PhysReg);
moveRegAndAliasesBack(It, LRURegisters, TRI);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The aliases are probably in another regclass. I'm not sure anymore why we filtered on regclass, but if we want to move all registers in the alias set of PhysReg back, it would be cheaper to do an 'isInAliasSet(PhysReg)' than to find all aliases separately.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LRURegisters can contain registers of multiple register classes, so we need to filter out the phys regs that are not compatible.

I'm not sure I understand the comment about isInAliasSet(PhysReg). Here, if we choose $cm0, I want to move bmh0 and bml0 to the back of the list as well to make sure they are not picked when we need to re-allocate a BM register.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just that we do a find over the full list for each alias. I'm assuming that sweeping it once to move all aliases back would be cheaper. No big deal.

@@ -248,7 +253,12 @@ bool AIEWawRegRewriter::renameMBBPhysRegs(const MachineBasicBlock *MBB) {
continue;

if (isWorthRenaming(Reg, VRegWithCopies)) {
Candidates.emplace_back(&MO, VRM->getPhys(Reg));
MCRegister AssignedPhysReg = VRM->getPhys(Reg);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const?

getKilledRegAtSingledDefPoint(VReg, *MRI);
KilledReg &&
any_of(TRI->regunits(PhysReg),
[&UsedUnits](MCRegUnit RU) { return UsedUnits.test(RU); })) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is far beyond my limits to reasonably nest statements in an if condition.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will be happy if we can extract at least one lambda/predicate like WasUsedBefore(PhysReg) ;-)

#
# (c) Copyright 2025 Advanced Micro Devices, Inc. or its affiliates
# unit test for the WAW register renaming pass and check edge cases
# RUN: llc -mtriple=aie2 -verify-machineinstrs --start-before=greedy --stop-after=virtregrewriter \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Do we have a motivating example for aie2p?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I know of (so far)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be very happy if it could improve SigmoidTemplated_bf16_0

LLVM_DEBUG(dbgs() << "Candidate " << printReg(Reg, TRI, 0, MRI) << ":"
<< TRI->getRegClassName(MRI->getRegClass(Reg)) << " ("
<< TRI->getName(AssignedPhysReg) << ")\n");
assert(AssignedPhysReg != VirtRegMap::NO_PHYS_REG);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not check AssignedPhysReg.isVirtual()?

Do we want to rename StackSlots?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I get this, AssignedPhysReg is a physreg, not a virtual one. Here I just want to make sure that every candidate for renaming does have phys reg assigned.

Copy link
Collaborator

@martien-de-jong martien-de-jong Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still looks funny that this value is defined in the VirtRegMap scope. How does it know it conflicts with the MCRegister encoding? Would VRM->hasPhys(Reg) be more suitable?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NO_PHYS_REG is the sentinel value used by VRM to say "nothing is assigned", so that scope makes sense to me. I'll use VRM->hasPhys(Reg) nonetheless :)

// The order in Registers represent the LeastRecentlyUsed, which will
// distribute register use evenly
for (auto It = LRURegisters.begin(); It != LRURegisters.end(); ++It) {
MCPhysReg PhysReg = *It;
Copy link
Collaborator

@F-Stuckmann F-Stuckmann Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using Physical Register as a whole for the round robin scheme, we could use RegUnits.
this way, we would differentiate between different registers, that partially populate the same RegUnits, i.e. wl/h, X and y registers.

Unsure if this is necessary at this point.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are suggesting to use a list of RegUnits, instead of a list of MCRegister? I have to say I do prefer the latter, because it makes it really straightforward to retrieve the least-recently used register.

addPass(createAIEWawRegRewriter());
addPass(createGreedyRegisterAllocator());
Copy link
Collaborator

@F-Stuckmann F-Stuckmann Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should greedy also run after createAIEWawRegRewriter in AIE2p?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely, in general, the AIEWawRegRewriter pass does not even run for AIE2p. I assumed that the pass pipeline is shared, but that part is not. Good catch!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a new commit for AIE2p

};

// Least-Recently-Used list. Append new uses at the end, search from the start
std::list<MCPhysReg> LRURegisters;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is closer to Least-Recently-Defined. LRU. Could we have any additional gain by implementing a full LRU approach? I mean, updating the list considering the uses as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is meant as "least recently used for re-allocation". I could rename that to "least recently allocated" maybe?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe for a future reference.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's using the common acronym LRU from caching strategies. Indeed, it is referring to the allocation in a forward traversal, not as defining a register.

@@ -326,6 +328,11 @@ bool AIEWawRegRewriter::renameMBBPhysRegs(const MachineBasicBlock *MBB) {
// For each reg class, allocate the candidates in round-robin fashion.
// If we fail, we fall back to the original allocation
BitVector RegSet{TRI->getNumRegs()};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: more descriptive name?

@gbossu gbossu force-pushed the gaetan.register.realloc branch from 47b6897 to f708d1f Compare February 12, 2025 14:32
@gbossu
Copy link
Collaborator Author

gbossu commented Feb 12, 2025

I think I addressed all the comments. :) I also gave some QoR numbers in the PR description.

@gbossu gbossu force-pushed the gaetan.register.realloc branch 2 times, most recently from 0e974e6 to 2553b39 Compare February 13, 2025 11:25
; CHECK-NEXT: vadd.16 x2, x0, x2
; CHECK-NEXT: vadd.16 x2, x0, x2
; CHECK-NEXT: nop
; CHECK-NEXT: vmov x0, x2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: no net effect in the loop, but ABI constraint ignored.

gbossu and others added 7 commits February 13, 2025 13:19
This limits re-usal of registers and therefore limits the WAW
dependencies as well which will hurt for SWP.

Co-authored-by: Martien de Jong <[email protected]>
Some instructions represent a single-use and single-def point. For
those, it is generally fine to allocate the same phys reg to the killed
input, and the def.
Needs to be enabled with care, because live-through registers will get
de-allocated, and re-allocated in a final run of greedy regalloc. This
can yield a sub-optimal allocation.
@gbossu gbossu enabled auto-merge (rebase) February 13, 2025 14:16
@gbossu gbossu merged commit 3e75a3d into aie-public Feb 13, 2025
8 checks passed
@gbossu gbossu deleted the gaetan.register.realloc branch February 13, 2025 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants