Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename GPR Register in innermost loop body #245

Draft
wants to merge 2 commits into
base: aie-public
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
210 changes: 194 additions & 16 deletions llvm/lib/Target/AIE/AIEWawRegRewriter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -22,24 +22,27 @@
#include "llvm/CodeGen/LiveIntervals.h"
#include "llvm/CodeGen/LiveRegMatrix.h"
#include "llvm/CodeGen/LiveStacks.h"
#include "llvm/CodeGen/MachineBasicBlock.h"
#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/MachineInstr.h"
#include "llvm/CodeGen/MachineOperand.h"
#include "llvm/CodeGen/Passes.h"
#include "llvm/CodeGen/SlotIndexes.h"
#include "llvm/CodeGen/TargetInstrInfo.h"
#include "llvm/CodeGen/TargetSubtargetInfo.h"
#include "llvm/MC/MCRegister.h"
#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"
#include <llvm/CodeGen/MachineBasicBlock.h>

using namespace llvm;

#define DEBUG_TYPE "aie-waw-reg-rewrite"

namespace {
static cl::opt<bool>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we already have an anonymous namespace, we can move this cmd option to there (removing also static). The result will be the same but a bit more cleaner.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also wondering if we can reuse the first command option in some way, like:

--aie-wawreg-rewrite=vec
--aie-wawreg-rewrite=gpr
--aie-wawreg-rewrite=all <--- default

We can turn that option as a global and reuse here. My concern is that those existing options are tied.

RenameAllRegs("aie-waw-rename-all", cl::Hidden, cl::init(true),
cl::desc("Rename every Register, not only accumulator and "
"vector registers (default true)."));

namespace {
using PhysRegVec = SmallVector<MCPhysReg, 40>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this vector is just pretending to be small ;-)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use directly a bit vector/bit representation here?

///
/// This pass rewrites physical register assignments in critical parts of the
/// code (like loops) to break WAW and WAR dependencies.
Expand Down Expand Up @@ -82,14 +85,23 @@ class AIEWawRegRewriter : public MachineFunctionPass {
LiveRegMatrix *LRM = nullptr;
LiveIntervals *LIS = nullptr;
const TargetInstrInfo *TII = nullptr;
/// Vector to keep track which additional registers are blocked
/// by
/// a) being unused GPR registers within the MachineFunction or \
/// b) being Callee save registers. \
/// Unused GPR registers can be used for spill destinations, therefore
/// avoiding costly spills to memory.
PhysRegVec AdditionalBlockedRegs;

bool renameMBBPhysRegs(const MachineBasicBlock *MBB);

/// Get all the defined physical registers that the MachineBasicBlock already
/// uses. These physical registers should not be used for replacement
/// candidates, since this would introduce new WAW dependencies, which this
/// pass tries to remove.
BitVector getDefinedPhysRegs(const MachineBasicBlock *MBB) const;
/// Return a Bitvector with all the defined physical registers of the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: BitVector

/// \param MBB. These physical registers should not be
/// used for replacement candidates, since this would introduce new WAW
/// dependencies, which this pass tries to remove. Additionally, block all the
/// physical register in \param AdditionalBlockedRegs.
BitVector getBlockedPhysRegs(const MachineBasicBlock *MBB,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are passing parameter that is also a class member. Why not access it directly?

const PhysRegVec &AdditionalBlockedRegs) const;

/// returns true if the physical register of Reg was replaced
bool replaceReg(const Register Reg, BitVector &BlockedPhysRegs);
Expand Down Expand Up @@ -137,6 +149,27 @@ class AIEWawRegRewriter : public MachineFunctionPass {
/// removed for future replacement strategies, i.e. block wl4, wh4, y2 if X4
/// is used.
void addAliasRegs(BitVector &BlockedPhysRegs, const MCPhysReg PhysReg) const;

/// \return additional blocked registers of the \p MF
PhysRegVec getAdditionalBlockedRegs(const MachineFunction &MF) const;

/// Accumulate defined registers within the \param MBB in \param UsedRegs
void accumulateDefRegs(const MachineBasicBlock &MBB,
std::set<MCPhysReg> &UsedRegs) const;

/// Return all the GPR and alias register within \p MF
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: alias registers

PhysRegVec getGPRAndAliasRegs(const MachineFunction &MF) const;

/// Add Register to \p UsedRegs, if all its alias register are already used
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: alias registers.

/// and exist in \p AdditionalUsedRegs , i.e. if r16 and r17 are already
/// used, also insert their combined alias register (l0) into \p UsedRegs
/// since it is consequently also used.
/// Fixme: Work on Regunits and remove this method
void setAliasUsage(std::set<MCPhysReg> &UsedRegs,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems different from adding a plain GPR renaming capability. Can that be in a separate commit?

const PhysRegVec &AdditionalUsedRegs) const;

/// Remove callee save registers from \param UsedRegs
void removeCalleeSaveRegs(std::set<MCPhysReg> &UsedRegs) const;
};

MCPhysReg AIEWawRegRewriter::getAssignedPhysReg(const Register Reg) const {
Expand Down Expand Up @@ -168,6 +201,10 @@ bool AIEWawRegRewriter::runOnMachineFunction(MachineFunction &MF) {
LLVM_DEBUG(dbgs() << "*** WAW Loop Register Rewriting: " << MF.getName());
LLVM_DEBUG(dbgs() << " ***\n");

AdditionalBlockedRegs.clear();
if (RenameAllRegs)
AdditionalBlockedRegs = getAdditionalBlockedRegs(MF);

for (const MachineBasicBlock *MBB : LoopMBBs)
Modified |= renameMBBPhysRegs(MBB);

Expand Down Expand Up @@ -206,6 +243,138 @@ AIEWawRegRewriter::getVRegWithCopies(const MachineBasicBlock &MBB) const {
return VRegWithCopies;
}

PhysRegVec
AIEWawRegRewriter::getGPRAndAliasRegs(const MachineFunction &MF) const {
LLVM_DEBUG(dbgs() << "Collecting All GPR and Alias register\n");
PhysRegVec GPRRegs;

for (auto PhysReg : *TRI->getGPRRegClass(MF)) {
LLVM_DEBUG(dbgs() << "GPR check: " << printReg(PhysReg, TRI, 0, MRI)
<< "\n");

for (MCRegAliasIterator AI(MCRegister(PhysReg), TRI, true); AI.isValid();
++AI) {
auto AliasReg = *AI;
if (find(GPRRegs, AliasReg) != GPRRegs.end())
continue;

GPRRegs.push_back(AliasReg);
LLVM_DEBUG(dbgs() << " Added: " << printReg(PhysReg, TRI, 0, MRI)
<< "\n");
}
}

return GPRRegs;
}

void AIEWawRegRewriter::setAliasUsage(
std::set<MCPhysReg> &UsedRegs, const PhysRegVec &AdditionalUsedRegs) const {

for (auto PhysReg : AdditionalUsedRegs) {
if (UsedRegs.count(PhysReg))
continue;

bool OccupiedAllAliasRegs = true;
for (MCRegAliasIterator AI(MCRegister(PhysReg), TRI, true); AI.isValid();
++AI) {
auto AliasReg = *AI;
if (PhysReg == AliasReg)
continue;

// If even one sub-register of PhysReg is not used, PhysReg is also not
// used
if (TRI->getRegSizeInBits(PhysReg, *MRI) >=
TRI->getRegSizeInBits(AliasReg, *MRI) &&
UsedRegs.count(AliasReg) == 0) {
OccupiedAllAliasRegs = false;
break;
}
}

// If all the smaller register classes of PhysReg are used, PhysReg is also
// used, even if it is not directly used
if (OccupiedAllAliasRegs) {
LLVM_DEBUG(dbgs() << "All subregisters are used, so alias is also used: "
<< printReg(PhysReg, TRI, 0, MRI) << "\n");
UsedRegs.insert(PhysReg);
}
}
}

void AIEWawRegRewriter::accumulateDefRegs(const MachineBasicBlock &MBB,
std::set<MCPhysReg> &UsedRegs) const {
for (const MachineInstr &MI : MBB) {
for (const MachineOperand &OP : MI.defs()) {
MCPhysReg PhysReg = getAssignedPhysReg(OP.getReg());
if (!MCRegister::isPhysicalRegister(PhysReg))
continue;

UsedRegs.insert(PhysReg);
LLVM_DEBUG(dbgs() << "Encountered " << printReg(PhysReg, TRI, 0, MRI)
<< " \n");
}
}
}

/// Remove callee save registers from \param UsedRegs
void AIEWawRegRewriter::removeCalleeSaveRegs(
std::set<MCPhysReg> &UsedRegs) const {
LLVM_DEBUG(dbgs() << "Removing Calle Save Regs from UsedRegs:\n");
const auto *CSRegs = MRI->getCalleeSavedRegs();

for (unsigned I = 0; CSRegs[I]; ++I) {
for (MCRegAliasIterator AI(MCRegister(CSRegs[I]), TRI, true); AI.isValid();
++AI) {
auto AliasReg = *AI;
if (UsedRegs.count(AliasReg)) {
LLVM_DEBUG(dbgs() << "Already blocked: "
<< printReg(AliasReg, TRI, 0, MRI) << " \n");
continue;
}

UsedRegs.erase(AliasReg);
LLVM_DEBUG(dbgs() << "Erasing " << printReg(AliasReg, TRI, 0, MRI)
<< "\n");
}
}

LLVM_DEBUG(dbgs() << "\n");
}

PhysRegVec
AIEWawRegRewriter::getAdditionalBlockedRegs(const MachineFunction &MF) const {
// Only GPR registers have to be handled specially, since callee save
// registers should not be used for renaming (could cause spilling) and unused
// gpr registers can be used for register spilling to regs instead of memory.
// Spilling to regs is more performant, therefore block these unused registers
// for register renaming
LLVM_DEBUG(dbgs() << "Setting Additional Blocked Registers\n");

PhysRegVec GPRRegs = getGPRAndAliasRegs(MF);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be a way to make this whole code more generic? There seems to be a lot of special code for GPRs here.

Generally, I would expect:

  • We start with a list of interesting registers to rename, I guess that would be vectors and GPRs. I would expect this is the only place where we mention GPRs or VEC registers. The rest of the code should register-class-agnostic
  • We remove registers that are callee-saved
  • We remove registers that are already defined in the loop

And then we iterate over the registers (or rather: RegUnits) that are defined multiple times. For those we try to re-assign them using the list of "free" registers gathered above.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that handling blocked GPRs by function and Vector registers by block, tracked in different steps, complicates and duplicates the process a bit.


std::set<MCPhysReg> UsedRegs;
for (const MachineBasicBlock &MBB : MF)
accumulateDefRegs(MBB, UsedRegs);
Copy link
Collaborator

@andcarminati andcarminati Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about passing MF to this function, something like:

std::set<MCPhysReg> UsedRegs = getDefRegs(MF);

?


setAliasUsage(UsedRegs, GPRRegs);

removeCalleeSaveRegs(UsedRegs);

// Get unused GPR registers by removing the UsedRegs from
// the set of all available GPR Registers
GPRRegs.erase(std::remove_if(
GPRRegs.begin(), GPRRegs.end(),
[&UsedRegs](MCPhysReg Reg) { return UsedRegs.count(Reg); }),
GPRRegs.end());

LLVM_DEBUG(dbgs() << "Permanently blocking: "; for (auto Reg
: GPRRegs) {
dbgs() << printReg(Reg, TRI, 0, MRI) << " ";
} dbgs() << "\n");

return GPRRegs;
}

bool AIEWawRegRewriter::renameMBBPhysRegs(const MachineBasicBlock *MBB) {
LLVM_DEBUG(dbgs() << "WAW Reg Renaming BasicBlock "; MBB->dump();
dbgs() << "\n");
Expand All @@ -218,7 +387,7 @@ bool AIEWawRegRewriter::renameMBBPhysRegs(const MachineBasicBlock *MBB) {
// Get a list of registers, that are not allowed as a replacement register.
// This list gets updated with the newly replaced physical register, so that
// this pass does not introduce WAW dependencies.
BitVector BlockedPhysRegs = getDefinedPhysRegs(MBB);
BitVector BlockedPhysRegs = getBlockedPhysRegs(MBB, AdditionalBlockedRegs);

// Collect all the virtual registers that have at least a copy instruction
// that defines them. Subregisters may contain constants that may be shared
Expand Down Expand Up @@ -287,14 +456,15 @@ bool AIEWawRegRewriter::isWorthRenaming(const Register &Reg,
if (!UsedPhysRegs[VRM->getPhys(Reg)])
return false;

if (!TRI->isVecOrAccRegClass(*(MRI->getRegClass(Reg))))
if (!RenameAllRegs && !TRI->isVecOrAccRegClass(*(MRI->getRegClass(Reg))))
return false;

return !VRegWithCopies[Reg.virtRegIndex()];
}

BitVector
AIEWawRegRewriter::getDefinedPhysRegs(const MachineBasicBlock *MBB) const {
BitVector AIEWawRegRewriter::getBlockedPhysRegs(
const MachineBasicBlock *MBB,
const PhysRegVec &AdditionalBlockedRegs) const {
BitVector BlockedPhysRegs(TRI->getNumRegs());

for (const MachineInstr &MI : *MBB) {
Expand All @@ -305,23 +475,29 @@ AIEWawRegRewriter::getDefinedPhysRegs(const MachineBasicBlock *MBB) const {
}
}

// Do not allow additional blocked regs for register replacement
LLVM_DEBUG(dbgs() << "Additionally blocking: \n");
for (auto PhysReg : AdditionalBlockedRegs) {
addAliasRegs(BlockedPhysRegs, PhysReg);
}
LLVM_DEBUG(dbgs() << "\n");

return BlockedPhysRegs;
}

bool AIEWawRegRewriter::replaceReg(const Register Reg,
BitVector &BlockedPhysRegs) {
assert(Reg.isVirtual());
LLVM_DEBUG(dbgs() << " WAW RegRewriter: Register to replace "
<< TRI->getName(VRM->getPhys(Reg)) << "\n");
<< printReg(Reg, TRI, 0, MRI) << "\n");

MCPhysReg ReplacementPhysReg = getReplacementPhysReg(Reg, BlockedPhysRegs);

if (ReplacementPhysReg == MCRegister::NoRegister)
return false;

LLVM_DEBUG(dbgs() << " WAW Replacement: Virtual Register "
<< printReg(VRM->getPhys(Reg), TRI, 0, MRI)
<< " will replace "
<< printReg(Reg, TRI, 0, MRI) << " will replace "
<< printReg(VRM->getPhys(Reg), TRI, 0, MRI) << " with "
<< printReg(ReplacementPhysReg, TRI, 0, MRI) << '\n');

Expand All @@ -347,6 +523,8 @@ MCPhysReg AIEWawRegRewriter::getReplacementPhysReg(
if (IK == LiveRegMatrix::IK_Free)
return PhysReg;
}
LLVM_DEBUG(dbgs() << "No free register found for "
<< printReg(Reg, TRI, 0, MRI) << "\n");
return MCRegister::NoRegister;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -77,19 +77,19 @@ define void @test_loop_dyn_alloca(i32 noundef %n) {
; CHECK-NEXT: nopa ; nopx ; mov p6, sp
; CHECK-NEXT: mov p1, sp
; CHECK-NEXT: lshl r0, r17, r19
; CHECK-NEXT: add r0, r0, #31
; CHECK-NEXT: add r1, r0, #31
; CHECK-NEXT: jl #extern_call
; CHECK-NEXT: mov p0, p1 // Delay Slot 5
; CHECK-NEXT: and r0, r0, r20 // Delay Slot 4
; CHECK-NEXT: mov m0, r0 // Delay Slot 3
; CHECK-NEXT: and r2, r1, r20 // Delay Slot 4
; CHECK-NEXT: mov m0, r2 // Delay Slot 3
; CHECK-NEXT: paddb [p1], m0 // Delay Slot 2
; CHECK-NEXT: mov sp, p1 // Delay Slot 1
; CHECK-NEXT: nopa ; nopb ; add r17, r17, #1; nopm ; nops
; CHECK-NEXT: ltu r0, r17, r16
; CHECK-NEXT: add r21, r21, r0
; CHECK-NEXT: xor r0, r17, r18
; CHECK-NEXT: or r0, r0, r21
; CHECK-NEXT: jnz r0, #.LBB1_1
; CHECK-NEXT: ltu r3, r17, r16
; CHECK-NEXT: xor r4, r17, r18
; CHECK-NEXT: add r21, r21, r3
; CHECK-NEXT: or r5, r4, r21
; CHECK-NEXT: jnz r5, #.LBB1_1
; CHECK-NEXT: nop // Delay Slot 5
; CHECK-NEXT: nop // Delay Slot 4
; CHECK-NEXT: nop // Delay Slot 3
Expand Down
16 changes: 8 additions & 8 deletions llvm/test/CodeGen/AIE/aie2/dyn-stackalloc.ll
Original file line number Diff line number Diff line change
Expand Up @@ -77,19 +77,19 @@ define void @test_loop_dyn_alloca(i32 noundef %n) {
; CHECK-NEXT: nopa ; nopx ; mov p6, sp
; CHECK-NEXT: mov p1, sp
; CHECK-NEXT: lshl r0, r17, r19
; CHECK-NEXT: add r0, r0, #31
; CHECK-NEXT: add r1, r0, #31
; CHECK-NEXT: jl #extern_call
; CHECK-NEXT: mov p0, p1 // Delay Slot 5
; CHECK-NEXT: and r0, r0, r20 // Delay Slot 4
; CHECK-NEXT: mov m0, r0 // Delay Slot 3
; CHECK-NEXT: and r2, r1, r20 // Delay Slot 4
; CHECK-NEXT: mov m0, r2 // Delay Slot 3
; CHECK-NEXT: paddb [p1], m0 // Delay Slot 2
; CHECK-NEXT: mov sp, p1 // Delay Slot 1
; CHECK-NEXT: nopa ; nopb ; add r17, r17, #1; nopm ; nops
; CHECK-NEXT: ltu r0, r17, r16
; CHECK-NEXT: add r21, r21, r0
; CHECK-NEXT: xor r0, r17, r18
; CHECK-NEXT: or r0, r0, r21
; CHECK-NEXT: jnz r0, #.LBB1_1
; CHECK-NEXT: ltu r3, r17, r16
; CHECK-NEXT: xor r4, r17, r18
; CHECK-NEXT: add r21, r21, r3
; CHECK-NEXT: or r5, r4, r21
; CHECK-NEXT: jnz r5, #.LBB1_1
; CHECK-NEXT: nop // Delay Slot 5
; CHECK-NEXT: nop // Delay Slot 4
; CHECK-NEXT: nop // Delay Slot 3
Expand Down
4 changes: 2 additions & 2 deletions llvm/test/CodeGen/AIE/aie2/hardware-loops/nested.ll
Original file line number Diff line number Diff line change
Expand Up @@ -30,14 +30,14 @@ define void @nested(ptr nocapture %out, ptr nocapture readonly %in, i32 noundef
; CHECK-NEXT: // => This Inner Loop Header: Depth=2
; CHECK-NEXT: nopb ; nopa ; nops ; lshl r7, r6, r4; nopm ; nopv
; CHECK-NEXT: mov dj0, r7
; CHECK-NEXT: lda r7, [p3, dj0]
; CHECK-NEXT: lda r8, [p3, dj0]
; CHECK-NEXT: nop
; CHECK-NEXT: nop
; CHECK-NEXT: jnzd r5, r5, p2
; CHECK-NEXT: nop // Delay Slot 5
; CHECK-NEXT: nop // Delay Slot 4
; CHECK-NEXT: add r6, r6, #1 // Delay Slot 3
; CHECK-NEXT: add r2, r2, r7 // Delay Slot 2
; CHECK-NEXT: add r2, r2, r8 // Delay Slot 2
; CHECK-NEXT: st r2, [p0, #0] // Delay Slot 1
; CHECK-NEXT: .p2align 4
; CHECK-NEXT: // %bb.2: // %for.cond3.for.cond.cleanup5_crit_edge
Expand Down
Loading