-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Martien.pre2post #213
Martien.pre2post #213
Conversation
f08b580
to
8028243
Compare
HR.emitInScoreboard(Scoreboard, MI->getDesc(), MemoryBanks, | ||
MI->operands(), MI->getMF()->getRegInfo(), Cycle); | ||
Cycle += II; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you think of maybe adding a small test that benefits from "pre-booking modulo resources"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll see whether this tipped the conv2d balamce.
@@ -141,7 +141,7 @@ static cl::opt<int> SwpForceII("pipeliner-force-ii", | |||
static cl::opt<int> | |||
SwpMaxStages("pipeliner-max-stages", | |||
cl::desc("Maximum stages allowed in the generated scheduled."), | |||
cl::Hidden, cl::init(3)); | |||
cl::Hidden, cl::init(-1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might have trouble upstreaming that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the end, this option seems strange. This decision should always delegated to the target.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, we can push this option to -1 from the frontend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this predates shouldUseSchedule. I agree with Andreu that it should be decided there.
When we upstream, I'll repair machine-pipeliner users, e.g. by implementing it in the base class of PipelinerLoopInfo rather than in the pipeliner itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact I will do that in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. It's more complicated than that, because there's a MaxStages test in the main II loop. That means it could be moved to canAcceptII, but that's not upstreamed either. I propose to upstream both at the same time, moving the MaxStages check to the default implementation of canAcceptII
@@ -272,6 +272,9 @@ namespace llvm { | |||
MachineFunction *CurrFunc; | |||
MachineFunction::iterator NextMBB; | |||
|
|||
// Supply specific memory disambiguation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this flexibility!
8028243
to
c619de9
Compare
// PostPipeliner can do nothing without tripcount > 1 | ||
if (MinTripCount <= 1) { | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. It's a ZOL, Prepipeliner can't do anything either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not? We can add guards I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that is hard to add guards in post, it will change the CFG and will require a new register for the constant...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without a loop counter GPR?
@@ -55,7 +55,7 @@ static cl::opt<bool> | |||
"iterative loop scheduling")); | |||
|
|||
static cl::opt<int> PostPipelinerMaxII( | |||
"aie-postpipeliner-maxii", cl::init(10), | |||
"aie-postpipeliner-maxii", cl::init(40), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably too high. Maybe revisit when we hit noticable compile time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's okay, if makes me feel better about "aie-loop-aware-expensive-iterations", cl::init(35)
😆
; CHECK-TC-2: PLI: Accepting II=5 | ||
|
||
; CHECK-TC-1: Not interesting MinTripCount (<=1)! | ||
; CHECK-TC-1-NEXT: Loop rejected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Questionable auto-update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think the # NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 4
header should be removed. Could you maybe manually update the test?
AddrSpace ? ASI.getMemoryBanksFromAddressSpace(AddrSpace) | ||
: AddressSpaceNoneIsSafe ? 0 | ||
: ASI.getDefaultMemoryBank(); | ||
MemoryBankUsed |= MemoryBank; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this is more than an added option. We start at none, and collect the memory banks from all MMO's. We used to start at 'all banks' and intersect. The latter is incompatible with multi-memory ops like VLDB.4x
@@ -54,6 +54,9 @@ static cl::opt<int> | |||
static cl::opt<unsigned> | |||
MaxUnrollCost("aie-unroll-max-cost", cl::Hidden, cl::init(200), | |||
cl::desc("Maximum partial unroll cost for loops")); | |||
static cl::opt<unsigned> PreferSwpOverUnroll( | |||
"aie-prefer-swp-over-unroll", cl::Hidden, cl::init(9), | |||
cl::desc("Aim for pipelining if MinIterCount is at least this value.")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default is inspired by the fact that all ReduceABC's regress for 8 and lower.
if (BS.getRegions().size() == 1) { | ||
auto &PostSWP = BS.getPostSWP(); | ||
if (PostSWP.canAccept(*BS.TheBlock)) { | ||
BS.FixPoint.II = PostSWP.getResMII(*BS.TheBlock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This saves a lot of work. II divides the linear schedule length to determine the number of copies in the DAG, and DAG construction is expensive and worse than linear in the number of nodes. Something like Conv2D immediately hits the target II, so we would also save 7 iterations of the main loop.
c619de9
to
6a433ee
Compare
while (C < NInstr && (Scoreboard[C] & Slots)) { | ||
C++; | ||
} | ||
if (C >= NInstr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: C == NInstr
. with C > NInstr means that we need k+1 slots for k instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitter: This is the reverse of the first part of the loop condition, without having to know how C is updated.
C++; | ||
} | ||
if (C >= NInstr) { | ||
MII = NInstr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When this happens, do we try to run the pipeliner anyway? Should we return PostPipelinerMaxII
(provided we share this option)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This computes ResMII, not how it it used. I don't want to tie too many strings to it.
As we compute it now, it will never exceed NInstr. but even if II == NInstr, the latency path through those NInstrs can allow a 4 stage pipeline to be useful. And if the DAG is a straight chain of dependences, pipelining can still be cheap.
@@ -270,8 +300,15 @@ bool PostPipeliner::scheduleFirstIteration() { | |||
LLVM_DEBUG(dbgs() << " Emit in " << -Depth + LocalCycle << "\n"); | |||
int Cycle = -Depth + LocalCycle; | |||
LLVM_DEBUG(dbgs() << " Emit in " << Cycle << "\n"); | |||
HR.emitInScoreboard(Scoreboard, MI->getDesc(), MemoryBanks, MI->operands(), | |||
MI->getMF()->getRegInfo(), Cycle); | |||
for (int N = 0; N < NCopies; N++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I would use unsigned here ;-).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry, I solemnly pledged only to use unsigned for bitmasks (and perhaps for very small types like char for lookup tables)
6a433ee
to
e8dfeb1
Compare
if (NewLatest < Info[P].Latest) { | ||
Info[P].Latest = NewLatest; | ||
Changed = true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give an overview of what this commit is doing?
e8dfeb1
to
3eaff9a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🎉
…heduling the first
Also tweak II hand-off limit
Targets can cater for this in shouldUseSchedule()
3eaff9a
to
061a3ae
Compare
Some tweaking to get more leverage from the PostPipeliner: