-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Register re-allocation tweaks #340
Conversation
c4d21bd
to
47b6897
Compare
}; | ||
|
||
// Least-Recently-Used list. Append new uses at the end, search from the start | ||
std::list<MCPhysReg> LRURegisters; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RoundRobin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the declaration can be closer to its first use.
@@ -380,8 +396,7 @@ AIEWawRegRewriter::getReplacementPhysReg(const Register VReg, | |||
if (IK == LiveRegMatrix::IK_Free) { | |||
// Move it to the end of the list. We return, so don't have to | |||
// care about invalidation | |||
LRURegisters.erase(It); | |||
LRURegisters.emplace_back(PhysReg); | |||
moveRegAndAliasesBack(It, LRURegisters, TRI); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The aliases are probably in another regclass. I'm not sure anymore why we filtered on regclass, but if we want to move all registers in the alias set of PhysReg back, it would be cheaper to do an 'isInAliasSet(PhysReg)' than to find all aliases separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LRURegisters
can contain registers of multiple register classes, so we need to filter out the phys regs that are not compatible.
I'm not sure I understand the comment about isInAliasSet(PhysReg)
. Here, if we choose $cm0
, I want to move bmh0
and bml0
to the back of the list as well to make sure they are not picked when we need to re-allocate a BM
register.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just that we do a find over the full list for each alias. I'm assuming that sweeping it once to move all aliases back would be cheaper. No big deal.
@@ -248,7 +253,12 @@ bool AIEWawRegRewriter::renameMBBPhysRegs(const MachineBasicBlock *MBB) { | |||
continue; | |||
|
|||
if (isWorthRenaming(Reg, VRegWithCopies)) { | |||
Candidates.emplace_back(&MO, VRM->getPhys(Reg)); | |||
MCRegister AssignedPhysReg = VRM->getPhys(Reg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const?
getKilledRegAtSingledDefPoint(VReg, *MRI); | ||
KilledReg && | ||
any_of(TRI->regunits(PhysReg), | ||
[&UsedUnits](MCRegUnit RU) { return UsedUnits.test(RU); })) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this is far beyond my limits to reasonably nest statements in an if condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will be happy if we can extract at least one lambda/predicate like WasUsedBefore(PhysReg)
;-)
# | ||
# (c) Copyright 2025 Advanced Micro Devices, Inc. or its affiliates | ||
# unit test for the WAW register renaming pass and check edge cases | ||
# RUN: llc -mtriple=aie2 -verify-machineinstrs --start-before=greedy --stop-after=virtregrewriter \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Do we have a motivating example for aie2p?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that I know of (so far)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be very happy if it could improve SigmoidTemplated_bf16_0
LLVM_DEBUG(dbgs() << "Candidate " << printReg(Reg, TRI, 0, MRI) << ":" | ||
<< TRI->getRegClassName(MRI->getRegClass(Reg)) << " (" | ||
<< TRI->getName(AssignedPhysReg) << ")\n"); | ||
assert(AssignedPhysReg != VirtRegMap::NO_PHYS_REG); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not check AssignedPhysReg.isVirtual()?
Do we want to rename StackSlots?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I get this, AssignedPhysReg
is a physreg, not a virtual one. Here I just want to make sure that every candidate for renaming does have phys reg assigned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It still looks funny that this value is defined in the VirtRegMap scope. How does it know it conflicts with the MCRegister encoding? Would VRM->hasPhys(Reg)
be more suitable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NO_PHYS_REG
is the sentinel value used by VRM to say "nothing is assigned", so that scope makes sense to me. I'll use VRM->hasPhys(Reg)
nonetheless :)
// The order in Registers represent the LeastRecentlyUsed, which will | ||
// distribute register use evenly | ||
for (auto It = LRURegisters.begin(); It != LRURegisters.end(); ++It) { | ||
MCPhysReg PhysReg = *It; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using Physical Register as a whole for the round robin scheme, we could use RegUnits.
this way, we would differentiate between different registers, that partially populate the same RegUnits, i.e. wl/h, X and y registers.
Unsure if this is necessary at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are suggesting to use a list of RegUnits, instead of a list of MCRegister? I have to say I do prefer the latter, because it makes it really straightforward to retrieve the least-recently used register.
addPass(createAIEWawRegRewriter()); | ||
addPass(createGreedyRegisterAllocator()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should greedy also run after createAIEWawRegRewriter
in AIE2p
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely, in general, the AIEWawRegRewriter
pass does not even run for AIE2p. I assumed that the pass pipeline is shared, but that part is not. Good catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a new commit for AIE2p
}; | ||
|
||
// Least-Recently-Used list. Append new uses at the end, search from the start | ||
std::list<MCPhysReg> LRURegisters; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is closer to Least-Recently-Defined. LRU. Could we have any additional gain by implementing a full LRU approach? I mean, updating the list considering the uses as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is meant as "least recently used for re-allocation". I could rename that to "least recently allocated" maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe for a future reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's using the common acronym LRU from caching strategies. Indeed, it is referring to the allocation in a forward traversal, not as defining a register.
@@ -326,6 +328,11 @@ bool AIEWawRegRewriter::renameMBBPhysRegs(const MachineBasicBlock *MBB) { | |||
// For each reg class, allocate the candidates in round-robin fashion. | |||
// If we fail, we fall back to the original allocation | |||
BitVector RegSet{TRI->getNumRegs()}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: more descriptive name?
47b6897
to
f708d1f
Compare
I think I addressed all the comments. :) I also gave some QoR numbers in the PR description. |
0e974e6
to
2553b39
Compare
; CHECK-NEXT: vadd.16 x2, x0, x2 | ||
; CHECK-NEXT: vadd.16 x2, x0, x2 | ||
; CHECK-NEXT: nop | ||
; CHECK-NEXT: vmov x0, x2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: no net effect in the loop, but ABI constraint ignored.
This limits re-usal of registers and therefore limits the WAW dependencies as well which will hurt for SWP. Co-authored-by: Martien de Jong <[email protected]>
Some instructions represent a single-use and single-def point. For those, it is generally fine to allocate the same phys reg to the killed input, and the def.
Needs to be enabled with care, because live-through registers will get de-allocated, and re-allocated in a final run of greedy regalloc. This can yield a sub-optimal allocation.
2553b39
to
1dde07c
Compare
This is a follow-up of the work from @F-Stuckmann and @martien-de-jong. It uses the least-recently-used approach for re-allocating registers. This limits WAW dependencies and gives more freedom to the post-pipeliner.
On top of this, there are two extra commits to allow:
QoR: