Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LoopUtils] Cache VFs in addDiffRuntimeChecks (NFC) #130157

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

artagnon
Copy link
Contributor

@artagnon artagnon commented Mar 6, 2025

Caching the runtime-VF, which is actually a vscale expression is safe, when we previously thought that it was unsafe, partly due to a bad FIXME in one of the tests. Strip the FIXME, and demonstrate that GeneratedRTChecks::create does the right thing, by moving the logic for caching runtime-VF to LoopUtils. As a result, we improve the code in GeneratedRTChecks::create, to avoid a non-intuitive footgun. While the change seems functional, it is actually an NFC, because the types in the checks can never mismatch, although it's not good to rely on this property.

@llvmbot
Copy link
Member

llvmbot commented Mar 6, 2025

@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: Ramkumar Ramachandra (artagnon)

Changes

Caching the runtime-VF, which is actually a vscale expression is safe, when we previously thought that it was unsafe, partly due to a bad FIXME in one of the tests. Strip the FIXME, and demonstrate that GeneratedRTChecks::create does the right thing, by moving the logic for caching runtime-VF to LoopUtils. As a result, we improve the code in GeneratedRTChecks::create, to avoid a non-intuitive footgun.


Full diff: https://github.com/llvm/llvm-project/pull/130157.diff

3 Files Affected:

  • (modified) llvm/lib/Transforms/Utils/LoopUtils.cpp (+8-2)
  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+11-15)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-runtime-check-size-based-threshold.ll (-1)
diff --git a/llvm/lib/Transforms/Utils/LoopUtils.cpp b/llvm/lib/Transforms/Utils/LoopUtils.cpp
index ec1692a484ce0..074fae5bb0564 100644
--- a/llvm/lib/Transforms/Utils/LoopUtils.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUtils.cpp
@@ -2048,12 +2048,18 @@ Value *llvm::addDiffRuntimeChecks(
   // Map to keep track of created compares, The key is the pair of operands for
   // the compare, to allow detecting and re-using redundant compares.
   DenseMap<std::pair<Value *, Value *>, Value *> SeenCompares;
+  // Map to detect redundant values returned by GetVF.
+  DenseMap<Type *, Value *> SeenVFs;
   for (const auto &[SrcStart, SinkStart, AccessSize, NeedsFreeze] : Checks) {
     Type *Ty = SinkStart->getType();
+    Value *VF = SeenVFs.lookup(Ty);
+    if (!VF) {
+      VF = GetVF(ChkBuilder, Ty->getScalarSizeInBits());
+      SeenVFs.insert({Ty, VF});
+    }
     // Compute VF * IC * AccessSize.
     auto *VFTimesUFTimesSize =
-        ChkBuilder.CreateMul(GetVF(ChkBuilder, Ty->getScalarSizeInBits()),
-                             ConstantInt::get(Ty, IC * AccessSize));
+        ChkBuilder.CreateMul(VF, ConstantInt::get(Ty, IC * AccessSize));
     Value *Diff =
         Expander.expandCodeFor(SE.getMinusSCEV(SinkStart, SrcStart), Ty, Loc);
 
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index cb860a472d8f7..5fe6551c3f8e2 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -1924,21 +1924,17 @@ class GeneratedRTChecks {
                                  "vector.memcheck");
 
       auto DiffChecks = RtPtrChecking.getDiffChecks();
-      if (DiffChecks) {
-        Value *RuntimeVF = nullptr;
-        MemRuntimeCheckCond = addDiffRuntimeChecks(
-            MemCheckBlock->getTerminator(), *DiffChecks, MemCheckExp,
-            [VF, &RuntimeVF](IRBuilderBase &B, unsigned Bits) {
-              if (!RuntimeVF)
-                RuntimeVF = getRuntimeVF(B, B.getIntNTy(Bits), VF);
-              return RuntimeVF;
-            },
-            IC);
-      } else {
-        MemRuntimeCheckCond = addRuntimeChecks(
-            MemCheckBlock->getTerminator(), L, RtPtrChecking.getChecks(),
-            MemCheckExp, VectorizerParams::HoistRuntimeChecks);
-      }
+      MemRuntimeCheckCond =
+          DiffChecks
+              ? addDiffRuntimeChecks(
+                    MemCheckBlock->getTerminator(), *DiffChecks, MemCheckExp,
+                    [VF](IRBuilderBase &B, unsigned Bits) {
+                      return getRuntimeVF(B, B.getIntNTy(Bits), VF);
+                    },
+                    IC)
+              : addRuntimeChecks(MemCheckBlock->getTerminator(), L,
+                                 RtPtrChecking.getChecks(), MemCheckExp,
+                                 VectorizerParams::HoistRuntimeChecks);
       assert(MemRuntimeCheckCond &&
              "no RT checks generated although RtPtrChecking "
              "claimed checks are required");
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/sve-runtime-check-size-based-threshold.ll b/llvm/test/Transforms/LoopVectorize/AArch64/sve-runtime-check-size-based-threshold.ll
index feb27caf305a2..40116f161dd6b 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/sve-runtime-check-size-based-threshold.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/sve-runtime-check-size-based-threshold.ll
@@ -5,7 +5,6 @@ target triple = "aarch64-unknown-linux-gnu"
 
 ; Test case where the minimum profitable trip count due to runtime checks
 ; exceeds VF.getKnownMinValue() * UF.
-; FIXME: The code currently incorrectly is missing a umax(VF * UF, 28).
 define void @min_trip_count_due_to_runtime_checks_1(ptr %dst.1, ptr %dst.2, ptr %src.1, ptr %src.2, i64 %n) {
 ; CHECK-LABEL: @min_trip_count_due_to_runtime_checks_1(
 ; CHECK-NEXT:  entry:

@artagnon artagnon force-pushed the lv-runtimevf-cache-nfc branch from e0aed47 to 30ebfa4 Compare March 7, 2025 15:58

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Caching the runtime-VF, which is actually a vscale expression is safe,
when we previously thought that it was unsafe, partly due to a bad FIXME
in one of the tests. Strip the FIXME, and demonstrate that
GeneratedRTChecks::create does the right thing, by moving the logic for
caching runtime-VF to LoopUtils. As a result, we improve the code in
GeneratedRTChecks::create, to avoid a non-intuitive footgun.
@artagnon artagnon force-pushed the lv-runtimevf-cache-nfc branch from 30ebfa4 to fa655ef Compare March 7, 2025 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants