[LV][NFC] Pre-commit test for supporting strided accesses. #130563

Mel-Chen · 2025-03-10T09:11:12Z

Duplicate riscv-vector-reverse.ll as riscv-vector-reverse-output.ll to verify all generated IR, not just debug output.
Pre-commit for #128718.

llvmbot · 2025-03-10T09:11:49Z

@llvm/pr-subscribers-backend-risc-v

Author: Mel Chen (Mel-Chen)

Changes

Duplicate riscv-vector-reverse.ll as riscv-vector-reverse-output.ll to verify all generated IR, not just debug output.
Pre-commit for #128718.

Patch is 40.78 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/130563.diff

1 Files Affected:

(added) llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse-output.ll (+620)

diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse-output.ll b/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse-output.ll
new file mode 100644
index 0000000000000..a15a8342154ff
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse-output.ll
@@ -0,0 +1,620 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+
+;; This is the loop in c++ being vectorize in this file with
+;; vector.reverse
+;;  #pragma clang loop vectorize_width(4, scalable)
+;;  for (int i = N-1; i >= 0; --i)
+;;    a[i] = b[i] + 1.0;
+
+; RUN: opt -passes=loop-vectorize -mtriple=riscv64 -mattr=+v \
+; RUN: -riscv-v-vector-bits-min=128 -S < %s \
+; RUN: | FileCheck --check-prefix=RV64 %s
+
+; RUN: opt -passes=loop-vectorize -mtriple=riscv32 -mattr=+v \
+; RUN: -riscv-v-vector-bits-min=128 -S < %s \
+; RUN: | FileCheck --check-prefix=RV32 %s
+
+; RUN: opt -passes=loop-vectorize -mtriple=riscv64 -mattr=+v \
+; RUN: -riscv-v-vector-bits-min=128 -force-vector-interleave=2 -S < %s \
+; RUN: | FileCheck --check-prefix=RV64-UF2 %s
+
+define void @vector_reverse_i64(ptr noalias %A, ptr noalias %B, i32 %n) {
+; RV64-LABEL: define void @vector_reverse_i64(
+; RV64-SAME: ptr noalias [[A:%.*]], ptr noalias [[B:%.*]], i32 [[N:%.*]]) #[[ATTR0:[0-9]+]] {
+; RV64-NEXT:  [[ENTRY:.*:]]
+; RV64-NEXT:    [[CMP7:%.*]] = icmp sgt i32 [[N]], 0
+; RV64-NEXT:    br i1 [[CMP7]], label %[[FOR_BODY_PREHEADER:.*]], label %[[FOR_COND_CLEANUP:.*]]
+; RV64:       [[FOR_BODY_PREHEADER]]:
+; RV64-NEXT:    [[TMP0:%.*]] = zext i32 [[N]] to i64
+; RV64-NEXT:    [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
+; RV64-NEXT:    [[TMP2:%.*]] = mul i64 [[TMP1]], 4
+; RV64-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP0]], [[TMP2]]
+; RV64-NEXT:    br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_SCEVCHECK:.*]]
+; RV64:       [[VECTOR_SCEVCHECK]]:
+; RV64-NEXT:    [[TMP3:%.*]] = add nsw i64 [[TMP0]], -1
+; RV64-NEXT:    [[TMP4:%.*]] = add i32 [[N]], -1
+; RV64-NEXT:    [[TMP5:%.*]] = trunc i64 [[TMP3]] to i32
+; RV64-NEXT:    [[MUL:%.*]] = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 1, i32 [[TMP5]])
+; RV64-NEXT:    [[MUL_RESULT:%.*]] = extractvalue { i32, i1 } [[MUL]], 0
+; RV64-NEXT:    [[MUL_OVERFLOW:%.*]] = extractvalue { i32, i1 } [[MUL]], 1
+; RV64-NEXT:    [[TMP6:%.*]] = sub i32 [[TMP4]], [[MUL_RESULT]]
+; RV64-NEXT:    [[TMP7:%.*]] = icmp ugt i32 [[TMP6]], [[TMP4]]
+; RV64-NEXT:    [[TMP8:%.*]] = or i1 [[TMP7]], [[MUL_OVERFLOW]]
+; RV64-NEXT:    [[TMP9:%.*]] = icmp ugt i64 [[TMP3]], 4294967295
+; RV64-NEXT:    [[TMP10:%.*]] = or i1 [[TMP8]], [[TMP9]]
+; RV64-NEXT:    br i1 [[TMP10]], label %[[SCALAR_PH]], label %[[VECTOR_PH:.*]]
+; RV64:       [[VECTOR_PH]]:
+; RV64-NEXT:    [[TMP11:%.*]] = call i64 @llvm.vscale.i64()
+; RV64-NEXT:    [[TMP12:%.*]] = mul i64 [[TMP11]], 4
+; RV64-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[TMP0]], [[TMP12]]
+; RV64-NEXT:    [[N_VEC:%.*]] = sub i64 [[TMP0]], [[N_MOD_VF]]
+; RV64-NEXT:    [[TMP13:%.*]] = call i64 @llvm.vscale.i64()
+; RV64-NEXT:    [[TMP14:%.*]] = mul i64 [[TMP13]], 4
+; RV64-NEXT:    [[TMP15:%.*]] = sub i64 [[TMP0]], [[N_VEC]]
+; RV64-NEXT:    [[DOTCAST:%.*]] = trunc i64 [[N_VEC]] to i32
+; RV64-NEXT:    [[TMP16:%.*]] = sub i32 [[N]], [[DOTCAST]]
+; RV64-NEXT:    br label %[[VECTOR_BODY:.*]]
+; RV64:       [[VECTOR_BODY]]:
+; RV64-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; RV64-NEXT:    [[DOTCAST1:%.*]] = trunc i64 [[INDEX]] to i32
+; RV64-NEXT:    [[OFFSET_IDX:%.*]] = sub i32 [[N]], [[DOTCAST1]]
+; RV64-NEXT:    [[TMP17:%.*]] = add i32 [[OFFSET_IDX]], 0
+; RV64-NEXT:    [[TMP18:%.*]] = add nsw i32 [[TMP17]], -1
+; RV64-NEXT:    [[TMP19:%.*]] = zext i32 [[TMP18]] to i64
+; RV64-NEXT:    [[TMP20:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[TMP19]]
+; RV64-NEXT:    [[TMP21:%.*]] = mul i64 0, [[TMP14]]
+; RV64-NEXT:    [[TMP22:%.*]] = sub i64 1, [[TMP14]]
+; RV64-NEXT:    [[TMP23:%.*]] = getelementptr inbounds i32, ptr [[TMP20]], i64 [[TMP21]]
+; RV64-NEXT:    [[TMP24:%.*]] = getelementptr inbounds i32, ptr [[TMP23]], i64 [[TMP22]]
+; RV64-NEXT:    [[WIDE_LOAD:%.*]] = load <vscale x 4 x i32>, ptr [[TMP24]], align 4
+; RV64-NEXT:    [[REVERSE:%.*]] = call <vscale x 4 x i32> @llvm.vector.reverse.nxv4i32(<vscale x 4 x i32> [[WIDE_LOAD]])
+; RV64-NEXT:    [[TMP25:%.*]] = add <vscale x 4 x i32> [[REVERSE]], splat (i32 1)
+; RV64-NEXT:    [[TMP26:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[TMP19]]
+; RV64-NEXT:    [[TMP27:%.*]] = mul i64 0, [[TMP14]]
+; RV64-NEXT:    [[TMP28:%.*]] = sub i64 1, [[TMP14]]
+; RV64-NEXT:    [[TMP29:%.*]] = getelementptr inbounds i32, ptr [[TMP26]], i64 [[TMP27]]
+; RV64-NEXT:    [[TMP30:%.*]] = getelementptr inbounds i32, ptr [[TMP29]], i64 [[TMP28]]
+; RV64-NEXT:    [[REVERSE2:%.*]] = call <vscale x 4 x i32> @llvm.vector.reverse.nxv4i32(<vscale x 4 x i32> [[TMP25]])
+; RV64-NEXT:    store <vscale x 4 x i32> [[REVERSE2]], ptr [[TMP30]], align 4
+; RV64-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP14]]
+; RV64-NEXT:    [[TMP31:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; RV64-NEXT:    br i1 [[TMP31]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; RV64:       [[MIDDLE_BLOCK]]:
+; RV64-NEXT:    [[CMP_N:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC]]
+; RV64-NEXT:    br i1 [[CMP_N]], label %[[FOR_COND_CLEANUP_LOOPEXIT:.*]], label %[[SCALAR_PH]]
+; RV64:       [[SCALAR_PH]]:
+; RV64-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[TMP15]], %[[MIDDLE_BLOCK]] ], [ [[TMP0]], %[[FOR_BODY_PREHEADER]] ], [ [[TMP0]], %[[VECTOR_SCEVCHECK]] ]
+; RV64-NEXT:    [[BC_RESUME_VAL3:%.*]] = phi i32 [ [[TMP16]], %[[MIDDLE_BLOCK]] ], [ [[N]], %[[FOR_BODY_PREHEADER]] ], [ [[N]], %[[VECTOR_SCEVCHECK]] ]
+; RV64-NEXT:    br label %[[FOR_BODY:.*]]
+; RV64:       [[FOR_COND_CLEANUP_LOOPEXIT]]:
+; RV64-NEXT:    br label %[[FOR_COND_CLEANUP]]
+; RV64:       [[FOR_COND_CLEANUP]]:
+; RV64-NEXT:    ret void
+; RV64:       [[FOR_BODY]]:
+; RV64-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
+; RV64-NEXT:    [[I_0_IN8:%.*]] = phi i32 [ [[BC_RESUME_VAL3]], %[[SCALAR_PH]] ], [ [[I_0:%.*]], %[[FOR_BODY]] ]
+; RV64-NEXT:    [[I_0]] = add nsw i32 [[I_0_IN8]], -1
+; RV64-NEXT:    [[IDXPROM:%.*]] = zext i32 [[I_0]] to i64
+; RV64-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[IDXPROM]]
+; RV64-NEXT:    [[TMP32:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
+; RV64-NEXT:    [[ADD9:%.*]] = add i32 [[TMP32]], 1
+; RV64-NEXT:    [[ARRAYIDX3:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[IDXPROM]]
+; RV64-NEXT:    store i32 [[ADD9]], ptr [[ARRAYIDX3]], align 4
+; RV64-NEXT:    [[CMP:%.*]] = icmp ugt i64 [[INDVARS_IV]], 1
+; RV64-NEXT:    [[INDVARS_IV_NEXT]] = add nsw i64 [[INDVARS_IV]], -1
+; RV64-NEXT:    br i1 [[CMP]], label %[[FOR_BODY]], label %[[FOR_COND_CLEANUP_LOOPEXIT]], !llvm.loop [[LOOP4:![0-9]+]]
+;
+; RV32-LABEL: define void @vector_reverse_i64(
+; RV32-SAME: ptr noalias [[A:%.*]], ptr noalias [[B:%.*]], i32 [[N:%.*]]) #[[ATTR0:[0-9]+]] {
+; RV32-NEXT:  [[ENTRY:.*:]]
+; RV32-NEXT:    [[CMP7:%.*]] = icmp sgt i32 [[N]], 0
+; RV32-NEXT:    br i1 [[CMP7]], label %[[FOR_BODY_PREHEADER:.*]], label %[[FOR_COND_CLEANUP:.*]]
+; RV32:       [[FOR_BODY_PREHEADER]]:
+; RV32-NEXT:    [[TMP0:%.*]] = zext i32 [[N]] to i64
+; RV32-NEXT:    [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
+; RV32-NEXT:    [[TMP2:%.*]] = mul i64 [[TMP1]], 4
+; RV32-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP0]], [[TMP2]]
+; RV32-NEXT:    br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; RV32:       [[VECTOR_PH]]:
+; RV32-NEXT:    [[TMP3:%.*]] = call i64 @llvm.vscale.i64()
+; RV32-NEXT:    [[TMP4:%.*]] = mul i64 [[TMP3]], 4
+; RV32-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[TMP0]], [[TMP4]]
+; RV32-NEXT:    [[N_VEC:%.*]] = sub i64 [[TMP0]], [[N_MOD_VF]]
+; RV32-NEXT:    [[TMP5:%.*]] = call i64 @llvm.vscale.i64()
+; RV32-NEXT:    [[TMP6:%.*]] = mul i64 [[TMP5]], 4
+; RV32-NEXT:    [[TMP7:%.*]] = sub i64 [[TMP0]], [[N_VEC]]
+; RV32-NEXT:    [[DOTCAST:%.*]] = trunc i64 [[N_VEC]] to i32
+; RV32-NEXT:    [[TMP8:%.*]] = sub i32 [[N]], [[DOTCAST]]
+; RV32-NEXT:    br label %[[VECTOR_BODY:.*]]
+; RV32:       [[VECTOR_BODY]]:
+; RV32-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; RV32-NEXT:    [[DOTCAST1:%.*]] = trunc i64 [[INDEX]] to i32
+; RV32-NEXT:    [[OFFSET_IDX:%.*]] = sub i32 [[N]], [[DOTCAST1]]
+; RV32-NEXT:    [[TMP9:%.*]] = add i32 [[OFFSET_IDX]], 0
+; RV32-NEXT:    [[TMP10:%.*]] = add nsw i32 [[TMP9]], -1
+; RV32-NEXT:    [[TMP11:%.*]] = zext i32 [[TMP10]] to i64
+; RV32-NEXT:    [[TMP12:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[TMP11]]
+; RV32-NEXT:    [[TMP13:%.*]] = trunc i64 [[TMP6]] to i32
+; RV32-NEXT:    [[TMP14:%.*]] = mul i32 0, [[TMP13]]
+; RV32-NEXT:    [[TMP15:%.*]] = sub i32 1, [[TMP13]]
+; RV32-NEXT:    [[TMP16:%.*]] = getelementptr inbounds i32, ptr [[TMP12]], i32 [[TMP14]]
+; RV32-NEXT:    [[TMP17:%.*]] = getelementptr inbounds i32, ptr [[TMP16]], i32 [[TMP15]]
+; RV32-NEXT:    [[WIDE_LOAD:%.*]] = load <vscale x 4 x i32>, ptr [[TMP17]], align 4
+; RV32-NEXT:    [[REVERSE:%.*]] = call <vscale x 4 x i32> @llvm.vector.reverse.nxv4i32(<vscale x 4 x i32> [[WIDE_LOAD]])
+; RV32-NEXT:    [[TMP18:%.*]] = add <vscale x 4 x i32> [[REVERSE]], splat (i32 1)
+; RV32-NEXT:    [[TMP19:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[TMP11]]
+; RV32-NEXT:    [[TMP20:%.*]] = trunc i64 [[TMP6]] to i32
+; RV32-NEXT:    [[TMP21:%.*]] = mul i32 0, [[TMP20]]
+; RV32-NEXT:    [[TMP22:%.*]] = sub i32 1, [[TMP20]]
+; RV32-NEXT:    [[TMP23:%.*]] = getelementptr inbounds i32, ptr [[TMP19]], i32 [[TMP21]]
+; RV32-NEXT:    [[TMP24:%.*]] = getelementptr inbounds i32, ptr [[TMP23]], i32 [[TMP22]]
+; RV32-NEXT:    [[REVERSE2:%.*]] = call <vscale x 4 x i32> @llvm.vector.reverse.nxv4i32(<vscale x 4 x i32> [[TMP18]])
+; RV32-NEXT:    store <vscale x 4 x i32> [[REVERSE2]], ptr [[TMP24]], align 4
+; RV32-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP6]]
+; RV32-NEXT:    [[TMP25:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; RV32-NEXT:    br i1 [[TMP25]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; RV32:       [[MIDDLE_BLOCK]]:
+; RV32-NEXT:    [[CMP_N:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC]]
+; RV32-NEXT:    br i1 [[CMP_N]], label %[[FOR_COND_CLEANUP_LOOPEXIT:.*]], label %[[SCALAR_PH]]
+; RV32:       [[SCALAR_PH]]:
+; RV32-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[TMP7]], %[[MIDDLE_BLOCK]] ], [ [[TMP0]], %[[FOR_BODY_PREHEADER]] ]
+; RV32-NEXT:    [[BC_RESUME_VAL3:%.*]] = phi i32 [ [[TMP8]], %[[MIDDLE_BLOCK]] ], [ [[N]], %[[FOR_BODY_PREHEADER]] ]
+; RV32-NEXT:    br label %[[FOR_BODY:.*]]
+; RV32:       [[FOR_COND_CLEANUP_LOOPEXIT]]:
+; RV32-NEXT:    br label %[[FOR_COND_CLEANUP]]
+; RV32:       [[FOR_COND_CLEANUP]]:
+; RV32-NEXT:    ret void
+; RV32:       [[FOR_BODY]]:
+; RV32-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
+; RV32-NEXT:    [[I_0_IN8:%.*]] = phi i32 [ [[BC_RESUME_VAL3]], %[[SCALAR_PH]] ], [ [[I_0:%.*]], %[[FOR_BODY]] ]
+; RV32-NEXT:    [[I_0]] = add nsw i32 [[I_0_IN8]], -1
+; RV32-NEXT:    [[IDXPROM:%.*]] = zext i32 [[I_0]] to i64
+; RV32-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[IDXPROM]]
+; RV32-NEXT:    [[TMP26:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
+; RV32-NEXT:    [[ADD9:%.*]] = add i32 [[TMP26]], 1
+; RV32-NEXT:    [[ARRAYIDX3:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[IDXPROM]]
+; RV32-NEXT:    store i32 [[ADD9]], ptr [[ARRAYIDX3]], align 4
+; RV32-NEXT:    [[CMP:%.*]] = icmp ugt i64 [[INDVARS_IV]], 1
+; RV32-NEXT:    [[INDVARS_IV_NEXT]] = add nsw i64 [[INDVARS_IV]], -1
+; RV32-NEXT:    br i1 [[CMP]], label %[[FOR_BODY]], label %[[FOR_COND_CLEANUP_LOOPEXIT]], !llvm.loop [[LOOP4:![0-9]+]]
+;
+; RV64-UF2-LABEL: define void @vector_reverse_i64(
+; RV64-UF2-SAME: ptr noalias [[A:%.*]], ptr noalias [[B:%.*]], i32 [[N:%.*]]) #[[ATTR0:[0-9]+]] {
+; RV64-UF2-NEXT:  [[ENTRY:.*:]]
+; RV64-UF2-NEXT:    [[CMP7:%.*]] = icmp sgt i32 [[N]], 0
+; RV64-UF2-NEXT:    br i1 [[CMP7]], label %[[FOR_BODY_PREHEADER:.*]], label %[[FOR_COND_CLEANUP:.*]]
+; RV64-UF2:       [[FOR_BODY_PREHEADER]]:
+; RV64-UF2-NEXT:    [[TMP0:%.*]] = zext i32 [[N]] to i64
+; RV64-UF2-NEXT:    [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
+; RV64-UF2-NEXT:    [[TMP2:%.*]] = mul i64 [[TMP1]], 8
+; RV64-UF2-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP0]], [[TMP2]]
+; RV64-UF2-NEXT:    br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_SCEVCHECK:.*]]
+; RV64-UF2:       [[VECTOR_SCEVCHECK]]:
+; RV64-UF2-NEXT:    [[TMP3:%.*]] = add nsw i64 [[TMP0]], -1
+; RV64-UF2-NEXT:    [[TMP4:%.*]] = add i32 [[N]], -1
+; RV64-UF2-NEXT:    [[TMP5:%.*]] = trunc i64 [[TMP3]] to i32
+; RV64-UF2-NEXT:    [[MUL:%.*]] = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 1, i32 [[TMP5]])
+; RV64-UF2-NEXT:    [[MUL_RESULT:%.*]] = extractvalue { i32, i1 } [[MUL]], 0
+; RV64-UF2-NEXT:    [[MUL_OVERFLOW:%.*]] = extractvalue { i32, i1 } [[MUL]], 1
+; RV64-UF2-NEXT:    [[TMP6:%.*]] = sub i32 [[TMP4]], [[MUL_RESULT]]
+; RV64-UF2-NEXT:    [[TMP7:%.*]] = icmp ugt i32 [[TMP6]], [[TMP4]]
+; RV64-UF2-NEXT:    [[TMP8:%.*]] = or i1 [[TMP7]], [[MUL_OVERFLOW]]
+; RV64-UF2-NEXT:    [[TMP9:%.*]] = icmp ugt i64 [[TMP3]], 4294967295
+; RV64-UF2-NEXT:    [[TMP10:%.*]] = or i1 [[TMP8]], [[TMP9]]
+; RV64-UF2-NEXT:    br i1 [[TMP10]], label %[[SCALAR_PH]], label %[[VECTOR_PH:.*]]
+; RV64-UF2:       [[VECTOR_PH]]:
+; RV64-UF2-NEXT:    [[TMP11:%.*]] = call i64 @llvm.vscale.i64()
+; RV64-UF2-NEXT:    [[TMP12:%.*]] = mul i64 [[TMP11]], 8
+; RV64-UF2-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[TMP0]], [[TMP12]]
+; RV64-UF2-NEXT:    [[N_VEC:%.*]] = sub i64 [[TMP0]], [[N_MOD_VF]]
+; RV64-UF2-NEXT:    [[TMP13:%.*]] = call i64 @llvm.vscale.i64()
+; RV64-UF2-NEXT:    [[TMP14:%.*]] = mul i64 [[TMP13]], 4
+; RV64-UF2-NEXT:    [[TMP15:%.*]] = mul i64 [[TMP14]], 2
+; RV64-UF2-NEXT:    [[TMP16:%.*]] = sub i64 [[TMP0]], [[N_VEC]]
+; RV64-UF2-NEXT:    [[DOTCAST:%.*]] = trunc i64 [[N_VEC]] to i32
+; RV64-UF2-NEXT:    [[TMP17:%.*]] = sub i32 [[N]], [[DOTCAST]]
+; RV64-UF2-NEXT:    br label %[[VECTOR_BODY:.*]]
+; RV64-UF2:       [[VECTOR_BODY]]:
+; RV64-UF2-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; RV64-UF2-NEXT:    [[DOTCAST1:%.*]] = trunc i64 [[INDEX]] to i32
+; RV64-UF2-NEXT:    [[OFFSET_IDX:%.*]] = sub i32 [[N]], [[DOTCAST1]]
+; RV64-UF2-NEXT:    [[TMP18:%.*]] = add i32 [[OFFSET_IDX]], 0
+; RV64-UF2-NEXT:    [[TMP19:%.*]] = add nsw i32 [[TMP18]], -1
+; RV64-UF2-NEXT:    [[TMP20:%.*]] = zext i32 [[TMP19]] to i64
+; RV64-UF2-NEXT:    [[TMP21:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[TMP20]]
+; RV64-UF2-NEXT:    [[TMP22:%.*]] = mul i64 0, [[TMP14]]
+; RV64-UF2-NEXT:    [[TMP23:%.*]] = sub i64 1, [[TMP14]]
+; RV64-UF2-NEXT:    [[TMP24:%.*]] = getelementptr inbounds i32, ptr [[TMP21]], i64 [[TMP22]]
+; RV64-UF2-NEXT:    [[TMP25:%.*]] = getelementptr inbounds i32, ptr [[TMP24]], i64 [[TMP23]]
+; RV64-UF2-NEXT:    [[TMP26:%.*]] = mul i64 -1, [[TMP14]]
+; RV64-UF2-NEXT:    [[TMP27:%.*]] = sub i64 1, [[TMP14]]
+; RV64-UF2-NEXT:    [[TMP28:%.*]] = getelementptr inbounds i32, ptr [[TMP21]], i64 [[TMP26]]
+; RV64-UF2-NEXT:    [[TMP29:%.*]] = getelementptr inbounds i32, ptr [[TMP28]], i64 [[TMP27]]
+; RV64-UF2-NEXT:    [[WIDE_LOAD:%.*]] = load <vscale x 4 x i32>, ptr [[TMP25]], align 4
+; RV64-UF2-NEXT:    [[REVERSE:%.*]] = call <vscale x 4 x i32> @llvm.vector.reverse.nxv4i32(<vscale x 4 x i32> [[WIDE_LOAD]])
+; RV64-UF2-NEXT:    [[WIDE_LOAD2:%.*]] = load <vscale x 4 x i32>, ptr [[TMP29]], align 4
+; RV64-UF2-NEXT:    [[REVERSE3:%.*]] = call <vscale x 4 x i32> @llvm.vector.reverse.nxv4i32(<vscale x 4 x i32> [[WIDE_LOAD2]])
+; RV64-UF2-NEXT:    [[TMP30:%.*]] = add <vscale x 4 x i32> [[REVERSE]], splat (i32 1)
+; RV64-UF2-NEXT:    [[TMP31:%.*]] = add <vscale x 4 x i32> [[REVERSE3]], splat (i32 1)
+; RV64-UF2-NEXT:    [[TMP32:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[TMP20]]
+; RV64-UF2-NEXT:    [[TMP33:%.*]] = mul i64 0, [[TMP14]]
+; RV64-UF2-NEXT:    [[TMP34:%.*]] = sub i64 1, [[TMP14]]
+; RV64-UF2-NEXT:    [[TMP35:%.*]] = getelementptr inbounds i32, ptr [[TMP32]], i64 [[TMP33]]
+; RV64-UF2-NEXT:    [[TMP36:%.*]] = getelementptr inbounds i32, ptr [[TMP35]], i64 [[TMP34]]
+; RV64-UF2-NEXT:    [[TMP37:%.*]] = mul i64 -1, [[TMP14]]
+; RV64-UF2-NEXT:    [[TMP38:%.*]] = sub i64 1, [[TMP14]]
+; RV64-UF2-NEXT:    [[TMP39:%.*]] = getelementptr inbounds i32, ptr [[TMP32]], i64 [[TMP37]]
+; RV64-UF2-NEXT:    [[TMP40:%.*]] = getelementptr inbounds i32, ptr [[TMP39]], i64 [[TMP38]]
+; RV64-UF2-NEXT:    [[REVERSE4:%.*]] = call <vscale x 4 x i32> @llvm.vector.reverse.nxv4i32(<vscale x 4 x i32> [[TMP30]])
+; RV64-UF2-NEXT:    store <vscale x 4 x i32> [[REVERSE4]], ptr [[TMP36]], align 4
+; RV64-UF2-NEXT:    [[REVERSE5:%.*]] = call <vscale x 4 x i32> @llvm.vector.reverse.nxv4i32(<vscale x 4 x i32> [[TMP31]])
+; RV64-UF2-NEXT:    store <vscale x 4 x i32> [[REVERSE5]], ptr [[TMP40]], align 4
+; RV64-UF2-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP15]]
+; RV64-UF2-NEXT:    [[TMP41:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; RV64-UF2-NEXT:    br i1 [[TMP41]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; RV64-UF2:       [[MIDDLE_BLOCK]]:
+; RV64-UF2-NEXT:    [[CMP_N:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC]]
+; RV64-UF2-NEXT:    br i1 [[CMP_N]], label %[[FOR_COND_CLEANUP_LOOPEXIT:.*]], label %[[SCALAR_PH]]
+; RV64-UF2:       [[SCALAR_PH]]:
+; RV64-UF2-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[TMP16]], %[[MIDDLE_BLOCK]] ], [ [[TMP0]], %[[FOR_BODY_PREHEADER]] ], [ [[TMP0]], %[[VECTOR_SCEVCHECK]] ]
+; RV64-UF2-NEXT:    [[BC_RESUME_VAL6:%.*]] = phi i32 [ [[TMP17]], %[[MIDDLE_BLOCK]] ], [ [[N]], %[[FOR_BODY_PREHEADER]] ], [ [[N]], %[[VECTOR_SCEVCHECK]] ]
+; RV64-UF2-NEXT:    br label %[[FOR_BODY:.*]]
+; RV64-UF2:       [[FOR_COND_CLEANUP_LOOPEXIT]]:
+; RV64-UF2-NEXT:    br label %[[FOR_COND_CLEANUP]]
+; RV64-UF2:       [[FOR_COND_CLEANUP]]:
+; RV64-UF2-NEXT:    ret void
+; RV64-UF2:       [[FOR_BODY]]:
+; RV64-UF2-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
+; RV64-UF2-NEXT:    [[I_0_IN8:%.*]] = phi i32 [ [[BC_RESUME_VAL6]], %[[SCALAR_PH]] ], [ [[I_0:%.*]], %[[FOR_BODY]] ]
+; RV64-UF2-NEXT:    [[I_0]] = add nsw i32 [[I_0_IN8]], -1
+; RV64-UF2-NEXT:    [[IDXPROM:%.*]] = zext i32 [[I_0]] to i64
+; RV64-UF2-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[IDXPROM]]
+; RV64-UF2-NEXT:    [[TMP42:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
+; RV64-UF2-NEXT:    [[ADD9:%.*]] = add i32 [[TMP42]], 1
+; RV64-UF2-NEXT:    [[ARRAYIDX3:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[IDXPROM]]
+; RV64-UF2-NEXT:    store i32 [[ADD9]], ptr [[ARRAYIDX3]], align 4
+; RV64-UF2-NEXT:    [[CMP:%.*]] = icmp ugt i64 [[INDVARS_IV]], 1
+; RV64-UF2-NEXT:    [[INDVARS_IV_NEXT]] = add nsw i64 [[INDVARS_IV]], -1
+; RV64-UF2-NEXT:    br i1 [[CMP]], label %[[FOR_BODY]], label %[[FOR_COND_CLEANUP_LOOPEXIT]], !llvm.loop [[LOOP4:![0-9]+]]
+;
+entry:
+  %cmp7 = icmp sgt i32 %n, 0
+  br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup
+
+for.body.preheader:                               ; preds = %entry
+  %0 = zext i32 %n to i64
+  br label %for.body
+
+for.cond.cleanup:                                 ; preds = %for.body, %entry
+  ret void
+
+for.body:                                         ; preds = %for.body.preheader, %for.body
+  %indvars.iv = phi i64 [ %0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]
+  %i.0.in8 = phi i32 [ %n, %for.body.preheader ], [ %i.0, %for.body ]
+  %i.0 = add nsw i32 %i.0.in8, -1
+  %idxprom = zext i32 %i.0 to i64
+  %arrayidx = getelementptr inbounds i32, ptr %B, i64 %idxprom
+  %1 = load i32, ptr %arrayidx, align 4
+  %add9 = add i32 %1, 1
+  %arrayidx3 = getelementptr inbounds i32, ptr %A, i64 %idxprom
+  store i32 %add9, ptr %arrayidx3, align 4
+  %cmp = icmp ugt i64 %indvars.iv, 1
+  %indvars.iv.next = add nsw i64 %indvars.iv, -1
+  br i1 %cmp, label %for.body, label %for.cond.cleanup, !llvm.loop !0
+}
+
+define voi...
[truncated]

llvmbot · 2025-03-10T09:11:50Z

@llvm/pr-subscribers-llvm-transforms

Author: Mel Chen (Mel-Chen)

Changes

Duplicate riscv-vector-reverse.ll as riscv-vector-reverse-output.ll to verify all generated IR, not just debug output.
Pre-commit for #128718.

Patch is 40.78 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/130563.diff

1 Files Affected:

(added) llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse-output.ll (+620)

diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse-output.ll b/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse-output.ll
new file mode 100644
index 0000000000000..a15a8342154ff
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse-output.ll
@@ -0,0 +1,620 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+
+;; This is the loop in c++ being vectorize in this file with
+;; vector.reverse
+;;  #pragma clang loop vectorize_width(4, scalable)
+;;  for (int i = N-1; i >= 0; --i)
+;;    a[i] = b[i] + 1.0;
+
+; RUN: opt -passes=loop-vectorize -mtriple=riscv64 -mattr=+v \
+; RUN: -riscv-v-vector-bits-min=128 -S < %s \
+; RUN: | FileCheck --check-prefix=RV64 %s
+
+; RUN: opt -passes=loop-vectorize -mtriple=riscv32 -mattr=+v \
+; RUN: -riscv-v-vector-bits-min=128 -S < %s \
+; RUN: | FileCheck --check-prefix=RV32 %s
+
+; RUN: opt -passes=loop-vectorize -mtriple=riscv64 -mattr=+v \
+; RUN: -riscv-v-vector-bits-min=128 -force-vector-interleave=2 -S < %s \
+; RUN: | FileCheck --check-prefix=RV64-UF2 %s
+
+define void @vector_reverse_i64(ptr noalias %A, ptr noalias %B, i32 %n) {
+; RV64-LABEL: define void @vector_reverse_i64(
+; RV64-SAME: ptr noalias [[A:%.*]], ptr noalias [[B:%.*]], i32 [[N:%.*]]) #[[ATTR0:[0-9]+]] {
+; RV64-NEXT:  [[ENTRY:.*:]]
+; RV64-NEXT:    [[CMP7:%.*]] = icmp sgt i32 [[N]], 0
+; RV64-NEXT:    br i1 [[CMP7]], label %[[FOR_BODY_PREHEADER:.*]], label %[[FOR_COND_CLEANUP:.*]]
+; RV64:       [[FOR_BODY_PREHEADER]]:
+; RV64-NEXT:    [[TMP0:%.*]] = zext i32 [[N]] to i64
+; RV64-NEXT:    [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
+; RV64-NEXT:    [[TMP2:%.*]] = mul i64 [[TMP1]], 4
+; RV64-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP0]], [[TMP2]]
+; RV64-NEXT:    br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_SCEVCHECK:.*]]
+; RV64:       [[VECTOR_SCEVCHECK]]:
+; RV64-NEXT:    [[TMP3:%.*]] = add nsw i64 [[TMP0]], -1
+; RV64-NEXT:    [[TMP4:%.*]] = add i32 [[N]], -1
+; RV64-NEXT:    [[TMP5:%.*]] = trunc i64 [[TMP3]] to i32
+; RV64-NEXT:    [[MUL:%.*]] = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 1, i32 [[TMP5]])
+; RV64-NEXT:    [[MUL_RESULT:%.*]] = extractvalue { i32, i1 } [[MUL]], 0
+; RV64-NEXT:    [[MUL_OVERFLOW:%.*]] = extractvalue { i32, i1 } [[MUL]], 1
+; RV64-NEXT:    [[TMP6:%.*]] = sub i32 [[TMP4]], [[MUL_RESULT]]
+; RV64-NEXT:    [[TMP7:%.*]] = icmp ugt i32 [[TMP6]], [[TMP4]]
+; RV64-NEXT:    [[TMP8:%.*]] = or i1 [[TMP7]], [[MUL_OVERFLOW]]
+; RV64-NEXT:    [[TMP9:%.*]] = icmp ugt i64 [[TMP3]], 4294967295
+; RV64-NEXT:    [[TMP10:%.*]] = or i1 [[TMP8]], [[TMP9]]
+; RV64-NEXT:    br i1 [[TMP10]], label %[[SCALAR_PH]], label %[[VECTOR_PH:.*]]
+; RV64:       [[VECTOR_PH]]:
+; RV64-NEXT:    [[TMP11:%.*]] = call i64 @llvm.vscale.i64()
+; RV64-NEXT:    [[TMP12:%.*]] = mul i64 [[TMP11]], 4
+; RV64-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[TMP0]], [[TMP12]]
+; RV64-NEXT:    [[N_VEC:%.*]] = sub i64 [[TMP0]], [[N_MOD_VF]]
+; RV64-NEXT:    [[TMP13:%.*]] = call i64 @llvm.vscale.i64()
+; RV64-NEXT:    [[TMP14:%.*]] = mul i64 [[TMP13]], 4
+; RV64-NEXT:    [[TMP15:%.*]] = sub i64 [[TMP0]], [[N_VEC]]
+; RV64-NEXT:    [[DOTCAST:%.*]] = trunc i64 [[N_VEC]] to i32
+; RV64-NEXT:    [[TMP16:%.*]] = sub i32 [[N]], [[DOTCAST]]
+; RV64-NEXT:    br label %[[VECTOR_BODY:.*]]
+; RV64:       [[VECTOR_BODY]]:
+; RV64-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; RV64-NEXT:    [[DOTCAST1:%.*]] = trunc i64 [[INDEX]] to i32
+; RV64-NEXT:    [[OFFSET_IDX:%.*]] = sub i32 [[N]], [[DOTCAST1]]
+; RV64-NEXT:    [[TMP17:%.*]] = add i32 [[OFFSET_IDX]], 0
+; RV64-NEXT:    [[TMP18:%.*]] = add nsw i32 [[TMP17]], -1
+; RV64-NEXT:    [[TMP19:%.*]] = zext i32 [[TMP18]] to i64
+; RV64-NEXT:    [[TMP20:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[TMP19]]
+; RV64-NEXT:    [[TMP21:%.*]] = mul i64 0, [[TMP14]]
+; RV64-NEXT:    [[TMP22:%.*]] = sub i64 1, [[TMP14]]
+; RV64-NEXT:    [[TMP23:%.*]] = getelementptr inbounds i32, ptr [[TMP20]], i64 [[TMP21]]
+; RV64-NEXT:    [[TMP24:%.*]] = getelementptr inbounds i32, ptr [[TMP23]], i64 [[TMP22]]
+; RV64-NEXT:    [[WIDE_LOAD:%.*]] = load <vscale x 4 x i32>, ptr [[TMP24]], align 4
+; RV64-NEXT:    [[REVERSE:%.*]] = call <vscale x 4 x i32> @llvm.vector.reverse.nxv4i32(<vscale x 4 x i32> [[WIDE_LOAD]])
+; RV64-NEXT:    [[TMP25:%.*]] = add <vscale x 4 x i32> [[REVERSE]], splat (i32 1)
+; RV64-NEXT:    [[TMP26:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[TMP19]]
+; RV64-NEXT:    [[TMP27:%.*]] = mul i64 0, [[TMP14]]
+; RV64-NEXT:    [[TMP28:%.*]] = sub i64 1, [[TMP14]]
+; RV64-NEXT:    [[TMP29:%.*]] = getelementptr inbounds i32, ptr [[TMP26]], i64 [[TMP27]]
+; RV64-NEXT:    [[TMP30:%.*]] = getelementptr inbounds i32, ptr [[TMP29]], i64 [[TMP28]]
+; RV64-NEXT:    [[REVERSE2:%.*]] = call <vscale x 4 x i32> @llvm.vector.reverse.nxv4i32(<vscale x 4 x i32> [[TMP25]])
+; RV64-NEXT:    store <vscale x 4 x i32> [[REVERSE2]], ptr [[TMP30]], align 4
+; RV64-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP14]]
+; RV64-NEXT:    [[TMP31:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; RV64-NEXT:    br i1 [[TMP31]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; RV64:       [[MIDDLE_BLOCK]]:
+; RV64-NEXT:    [[CMP_N:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC]]
+; RV64-NEXT:    br i1 [[CMP_N]], label %[[FOR_COND_CLEANUP_LOOPEXIT:.*]], label %[[SCALAR_PH]]
+; RV64:       [[SCALAR_PH]]:
+; RV64-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[TMP15]], %[[MIDDLE_BLOCK]] ], [ [[TMP0]], %[[FOR_BODY_PREHEADER]] ], [ [[TMP0]], %[[VECTOR_SCEVCHECK]] ]
+; RV64-NEXT:    [[BC_RESUME_VAL3:%.*]] = phi i32 [ [[TMP16]], %[[MIDDLE_BLOCK]] ], [ [[N]], %[[FOR_BODY_PREHEADER]] ], [ [[N]], %[[VECTOR_SCEVCHECK]] ]
+; RV64-NEXT:    br label %[[FOR_BODY:.*]]
+; RV64:       [[FOR_COND_CLEANUP_LOOPEXIT]]:
+; RV64-NEXT:    br label %[[FOR_COND_CLEANUP]]
+; RV64:       [[FOR_COND_CLEANUP]]:
+; RV64-NEXT:    ret void
+; RV64:       [[FOR_BODY]]:
+; RV64-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
+; RV64-NEXT:    [[I_0_IN8:%.*]] = phi i32 [ [[BC_RESUME_VAL3]], %[[SCALAR_PH]] ], [ [[I_0:%.*]], %[[FOR_BODY]] ]
+; RV64-NEXT:    [[I_0]] = add nsw i32 [[I_0_IN8]], -1
+; RV64-NEXT:    [[IDXPROM:%.*]] = zext i32 [[I_0]] to i64
+; RV64-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[IDXPROM]]
+; RV64-NEXT:    [[TMP32:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
+; RV64-NEXT:    [[ADD9:%.*]] = add i32 [[TMP32]], 1
+; RV64-NEXT:    [[ARRAYIDX3:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[IDXPROM]]
+; RV64-NEXT:    store i32 [[ADD9]], ptr [[ARRAYIDX3]], align 4
+; RV64-NEXT:    [[CMP:%.*]] = icmp ugt i64 [[INDVARS_IV]], 1
+; RV64-NEXT:    [[INDVARS_IV_NEXT]] = add nsw i64 [[INDVARS_IV]], -1
+; RV64-NEXT:    br i1 [[CMP]], label %[[FOR_BODY]], label %[[FOR_COND_CLEANUP_LOOPEXIT]], !llvm.loop [[LOOP4:![0-9]+]]
+;
+; RV32-LABEL: define void @vector_reverse_i64(
+; RV32-SAME: ptr noalias [[A:%.*]], ptr noalias [[B:%.*]], i32 [[N:%.*]]) #[[ATTR0:[0-9]+]] {
+; RV32-NEXT:  [[ENTRY:.*:]]
+; RV32-NEXT:    [[CMP7:%.*]] = icmp sgt i32 [[N]], 0
+; RV32-NEXT:    br i1 [[CMP7]], label %[[FOR_BODY_PREHEADER:.*]], label %[[FOR_COND_CLEANUP:.*]]
+; RV32:       [[FOR_BODY_PREHEADER]]:
+; RV32-NEXT:    [[TMP0:%.*]] = zext i32 [[N]] to i64
+; RV32-NEXT:    [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
+; RV32-NEXT:    [[TMP2:%.*]] = mul i64 [[TMP1]], 4
+; RV32-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP0]], [[TMP2]]
+; RV32-NEXT:    br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; RV32:       [[VECTOR_PH]]:
+; RV32-NEXT:    [[TMP3:%.*]] = call i64 @llvm.vscale.i64()
+; RV32-NEXT:    [[TMP4:%.*]] = mul i64 [[TMP3]], 4
+; RV32-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[TMP0]], [[TMP4]]
+; RV32-NEXT:    [[N_VEC:%.*]] = sub i64 [[TMP0]], [[N_MOD_VF]]
+; RV32-NEXT:    [[TMP5:%.*]] = call i64 @llvm.vscale.i64()
+; RV32-NEXT:    [[TMP6:%.*]] = mul i64 [[TMP5]], 4
+; RV32-NEXT:    [[TMP7:%.*]] = sub i64 [[TMP0]], [[N_VEC]]
+; RV32-NEXT:    [[DOTCAST:%.*]] = trunc i64 [[N_VEC]] to i32
+; RV32-NEXT:    [[TMP8:%.*]] = sub i32 [[N]], [[DOTCAST]]
+; RV32-NEXT:    br label %[[VECTOR_BODY:.*]]
+; RV32:       [[VECTOR_BODY]]:
+; RV32-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; RV32-NEXT:    [[DOTCAST1:%.*]] = trunc i64 [[INDEX]] to i32
+; RV32-NEXT:    [[OFFSET_IDX:%.*]] = sub i32 [[N]], [[DOTCAST1]]
+; RV32-NEXT:    [[TMP9:%.*]] = add i32 [[OFFSET_IDX]], 0
+; RV32-NEXT:    [[TMP10:%.*]] = add nsw i32 [[TMP9]], -1
+; RV32-NEXT:    [[TMP11:%.*]] = zext i32 [[TMP10]] to i64
+; RV32-NEXT:    [[TMP12:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[TMP11]]
+; RV32-NEXT:    [[TMP13:%.*]] = trunc i64 [[TMP6]] to i32
+; RV32-NEXT:    [[TMP14:%.*]] = mul i32 0, [[TMP13]]
+; RV32-NEXT:    [[TMP15:%.*]] = sub i32 1, [[TMP13]]
+; RV32-NEXT:    [[TMP16:%.*]] = getelementptr inbounds i32, ptr [[TMP12]], i32 [[TMP14]]
+; RV32-NEXT:    [[TMP17:%.*]] = getelementptr inbounds i32, ptr [[TMP16]], i32 [[TMP15]]
+; RV32-NEXT:    [[WIDE_LOAD:%.*]] = load <vscale x 4 x i32>, ptr [[TMP17]], align 4
+; RV32-NEXT:    [[REVERSE:%.*]] = call <vscale x 4 x i32> @llvm.vector.reverse.nxv4i32(<vscale x 4 x i32> [[WIDE_LOAD]])
+; RV32-NEXT:    [[TMP18:%.*]] = add <vscale x 4 x i32> [[REVERSE]], splat (i32 1)
+; RV32-NEXT:    [[TMP19:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[TMP11]]
+; RV32-NEXT:    [[TMP20:%.*]] = trunc i64 [[TMP6]] to i32
+; RV32-NEXT:    [[TMP21:%.*]] = mul i32 0, [[TMP20]]
+; RV32-NEXT:    [[TMP22:%.*]] = sub i32 1, [[TMP20]]
+; RV32-NEXT:    [[TMP23:%.*]] = getelementptr inbounds i32, ptr [[TMP19]], i32 [[TMP21]]
+; RV32-NEXT:    [[TMP24:%.*]] = getelementptr inbounds i32, ptr [[TMP23]], i32 [[TMP22]]
+; RV32-NEXT:    [[REVERSE2:%.*]] = call <vscale x 4 x i32> @llvm.vector.reverse.nxv4i32(<vscale x 4 x i32> [[TMP18]])
+; RV32-NEXT:    store <vscale x 4 x i32> [[REVERSE2]], ptr [[TMP24]], align 4
+; RV32-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP6]]
+; RV32-NEXT:    [[TMP25:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; RV32-NEXT:    br i1 [[TMP25]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; RV32:       [[MIDDLE_BLOCK]]:
+; RV32-NEXT:    [[CMP_N:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC]]
+; RV32-NEXT:    br i1 [[CMP_N]], label %[[FOR_COND_CLEANUP_LOOPEXIT:.*]], label %[[SCALAR_PH]]
+; RV32:       [[SCALAR_PH]]:
+; RV32-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[TMP7]], %[[MIDDLE_BLOCK]] ], [ [[TMP0]], %[[FOR_BODY_PREHEADER]] ]
+; RV32-NEXT:    [[BC_RESUME_VAL3:%.*]] = phi i32 [ [[TMP8]], %[[MIDDLE_BLOCK]] ], [ [[N]], %[[FOR_BODY_PREHEADER]] ]
+; RV32-NEXT:    br label %[[FOR_BODY:.*]]
+; RV32:       [[FOR_COND_CLEANUP_LOOPEXIT]]:
+; RV32-NEXT:    br label %[[FOR_COND_CLEANUP]]
+; RV32:       [[FOR_COND_CLEANUP]]:
+; RV32-NEXT:    ret void
+; RV32:       [[FOR_BODY]]:
+; RV32-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
+; RV32-NEXT:    [[I_0_IN8:%.*]] = phi i32 [ [[BC_RESUME_VAL3]], %[[SCALAR_PH]] ], [ [[I_0:%.*]], %[[FOR_BODY]] ]
+; RV32-NEXT:    [[I_0]] = add nsw i32 [[I_0_IN8]], -1
+; RV32-NEXT:    [[IDXPROM:%.*]] = zext i32 [[I_0]] to i64
+; RV32-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[IDXPROM]]
+; RV32-NEXT:    [[TMP26:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
+; RV32-NEXT:    [[ADD9:%.*]] = add i32 [[TMP26]], 1
+; RV32-NEXT:    [[ARRAYIDX3:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[IDXPROM]]
+; RV32-NEXT:    store i32 [[ADD9]], ptr [[ARRAYIDX3]], align 4
+; RV32-NEXT:    [[CMP:%.*]] = icmp ugt i64 [[INDVARS_IV]], 1
+; RV32-NEXT:    [[INDVARS_IV_NEXT]] = add nsw i64 [[INDVARS_IV]], -1
+; RV32-NEXT:    br i1 [[CMP]], label %[[FOR_BODY]], label %[[FOR_COND_CLEANUP_LOOPEXIT]], !llvm.loop [[LOOP4:![0-9]+]]
+;
+; RV64-UF2-LABEL: define void @vector_reverse_i64(
+; RV64-UF2-SAME: ptr noalias [[A:%.*]], ptr noalias [[B:%.*]], i32 [[N:%.*]]) #[[ATTR0:[0-9]+]] {
+; RV64-UF2-NEXT:  [[ENTRY:.*:]]
+; RV64-UF2-NEXT:    [[CMP7:%.*]] = icmp sgt i32 [[N]], 0
+; RV64-UF2-NEXT:    br i1 [[CMP7]], label %[[FOR_BODY_PREHEADER:.*]], label %[[FOR_COND_CLEANUP:.*]]
+; RV64-UF2:       [[FOR_BODY_PREHEADER]]:
+; RV64-UF2-NEXT:    [[TMP0:%.*]] = zext i32 [[N]] to i64
+; RV64-UF2-NEXT:    [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
+; RV64-UF2-NEXT:    [[TMP2:%.*]] = mul i64 [[TMP1]], 8
+; RV64-UF2-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP0]], [[TMP2]]
+; RV64-UF2-NEXT:    br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_SCEVCHECK:.*]]
+; RV64-UF2:       [[VECTOR_SCEVCHECK]]:
+; RV64-UF2-NEXT:    [[TMP3:%.*]] = add nsw i64 [[TMP0]], -1
+; RV64-UF2-NEXT:    [[TMP4:%.*]] = add i32 [[N]], -1
+; RV64-UF2-NEXT:    [[TMP5:%.*]] = trunc i64 [[TMP3]] to i32
+; RV64-UF2-NEXT:    [[MUL:%.*]] = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 1, i32 [[TMP5]])
+; RV64-UF2-NEXT:    [[MUL_RESULT:%.*]] = extractvalue { i32, i1 } [[MUL]], 0
+; RV64-UF2-NEXT:    [[MUL_OVERFLOW:%.*]] = extractvalue { i32, i1 } [[MUL]], 1
+; RV64-UF2-NEXT:    [[TMP6:%.*]] = sub i32 [[TMP4]], [[MUL_RESULT]]
+; RV64-UF2-NEXT:    [[TMP7:%.*]] = icmp ugt i32 [[TMP6]], [[TMP4]]
+; RV64-UF2-NEXT:    [[TMP8:%.*]] = or i1 [[TMP7]], [[MUL_OVERFLOW]]
+; RV64-UF2-NEXT:    [[TMP9:%.*]] = icmp ugt i64 [[TMP3]], 4294967295
+; RV64-UF2-NEXT:    [[TMP10:%.*]] = or i1 [[TMP8]], [[TMP9]]
+; RV64-UF2-NEXT:    br i1 [[TMP10]], label %[[SCALAR_PH]], label %[[VECTOR_PH:.*]]
+; RV64-UF2:       [[VECTOR_PH]]:
+; RV64-UF2-NEXT:    [[TMP11:%.*]] = call i64 @llvm.vscale.i64()
+; RV64-UF2-NEXT:    [[TMP12:%.*]] = mul i64 [[TMP11]], 8
+; RV64-UF2-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[TMP0]], [[TMP12]]
+; RV64-UF2-NEXT:    [[N_VEC:%.*]] = sub i64 [[TMP0]], [[N_MOD_VF]]
+; RV64-UF2-NEXT:    [[TMP13:%.*]] = call i64 @llvm.vscale.i64()
+; RV64-UF2-NEXT:    [[TMP14:%.*]] = mul i64 [[TMP13]], 4
+; RV64-UF2-NEXT:    [[TMP15:%.*]] = mul i64 [[TMP14]], 2
+; RV64-UF2-NEXT:    [[TMP16:%.*]] = sub i64 [[TMP0]], [[N_VEC]]
+; RV64-UF2-NEXT:    [[DOTCAST:%.*]] = trunc i64 [[N_VEC]] to i32
+; RV64-UF2-NEXT:    [[TMP17:%.*]] = sub i32 [[N]], [[DOTCAST]]
+; RV64-UF2-NEXT:    br label %[[VECTOR_BODY:.*]]
+; RV64-UF2:       [[VECTOR_BODY]]:
+; RV64-UF2-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; RV64-UF2-NEXT:    [[DOTCAST1:%.*]] = trunc i64 [[INDEX]] to i32
+; RV64-UF2-NEXT:    [[OFFSET_IDX:%.*]] = sub i32 [[N]], [[DOTCAST1]]
+; RV64-UF2-NEXT:    [[TMP18:%.*]] = add i32 [[OFFSET_IDX]], 0
+; RV64-UF2-NEXT:    [[TMP19:%.*]] = add nsw i32 [[TMP18]], -1
+; RV64-UF2-NEXT:    [[TMP20:%.*]] = zext i32 [[TMP19]] to i64
+; RV64-UF2-NEXT:    [[TMP21:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[TMP20]]
+; RV64-UF2-NEXT:    [[TMP22:%.*]] = mul i64 0, [[TMP14]]
+; RV64-UF2-NEXT:    [[TMP23:%.*]] = sub i64 1, [[TMP14]]
+; RV64-UF2-NEXT:    [[TMP24:%.*]] = getelementptr inbounds i32, ptr [[TMP21]], i64 [[TMP22]]
+; RV64-UF2-NEXT:    [[TMP25:%.*]] = getelementptr inbounds i32, ptr [[TMP24]], i64 [[TMP23]]
+; RV64-UF2-NEXT:    [[TMP26:%.*]] = mul i64 -1, [[TMP14]]
+; RV64-UF2-NEXT:    [[TMP27:%.*]] = sub i64 1, [[TMP14]]
+; RV64-UF2-NEXT:    [[TMP28:%.*]] = getelementptr inbounds i32, ptr [[TMP21]], i64 [[TMP26]]
+; RV64-UF2-NEXT:    [[TMP29:%.*]] = getelementptr inbounds i32, ptr [[TMP28]], i64 [[TMP27]]
+; RV64-UF2-NEXT:    [[WIDE_LOAD:%.*]] = load <vscale x 4 x i32>, ptr [[TMP25]], align 4
+; RV64-UF2-NEXT:    [[REVERSE:%.*]] = call <vscale x 4 x i32> @llvm.vector.reverse.nxv4i32(<vscale x 4 x i32> [[WIDE_LOAD]])
+; RV64-UF2-NEXT:    [[WIDE_LOAD2:%.*]] = load <vscale x 4 x i32>, ptr [[TMP29]], align 4
+; RV64-UF2-NEXT:    [[REVERSE3:%.*]] = call <vscale x 4 x i32> @llvm.vector.reverse.nxv4i32(<vscale x 4 x i32> [[WIDE_LOAD2]])
+; RV64-UF2-NEXT:    [[TMP30:%.*]] = add <vscale x 4 x i32> [[REVERSE]], splat (i32 1)
+; RV64-UF2-NEXT:    [[TMP31:%.*]] = add <vscale x 4 x i32> [[REVERSE3]], splat (i32 1)
+; RV64-UF2-NEXT:    [[TMP32:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[TMP20]]
+; RV64-UF2-NEXT:    [[TMP33:%.*]] = mul i64 0, [[TMP14]]
+; RV64-UF2-NEXT:    [[TMP34:%.*]] = sub i64 1, [[TMP14]]
+; RV64-UF2-NEXT:    [[TMP35:%.*]] = getelementptr inbounds i32, ptr [[TMP32]], i64 [[TMP33]]
+; RV64-UF2-NEXT:    [[TMP36:%.*]] = getelementptr inbounds i32, ptr [[TMP35]], i64 [[TMP34]]
+; RV64-UF2-NEXT:    [[TMP37:%.*]] = mul i64 -1, [[TMP14]]
+; RV64-UF2-NEXT:    [[TMP38:%.*]] = sub i64 1, [[TMP14]]
+; RV64-UF2-NEXT:    [[TMP39:%.*]] = getelementptr inbounds i32, ptr [[TMP32]], i64 [[TMP37]]
+; RV64-UF2-NEXT:    [[TMP40:%.*]] = getelementptr inbounds i32, ptr [[TMP39]], i64 [[TMP38]]
+; RV64-UF2-NEXT:    [[REVERSE4:%.*]] = call <vscale x 4 x i32> @llvm.vector.reverse.nxv4i32(<vscale x 4 x i32> [[TMP30]])
+; RV64-UF2-NEXT:    store <vscale x 4 x i32> [[REVERSE4]], ptr [[TMP36]], align 4
+; RV64-UF2-NEXT:    [[REVERSE5:%.*]] = call <vscale x 4 x i32> @llvm.vector.reverse.nxv4i32(<vscale x 4 x i32> [[TMP31]])
+; RV64-UF2-NEXT:    store <vscale x 4 x i32> [[REVERSE5]], ptr [[TMP40]], align 4
+; RV64-UF2-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP15]]
+; RV64-UF2-NEXT:    [[TMP41:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; RV64-UF2-NEXT:    br i1 [[TMP41]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; RV64-UF2:       [[MIDDLE_BLOCK]]:
+; RV64-UF2-NEXT:    [[CMP_N:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC]]
+; RV64-UF2-NEXT:    br i1 [[CMP_N]], label %[[FOR_COND_CLEANUP_LOOPEXIT:.*]], label %[[SCALAR_PH]]
+; RV64-UF2:       [[SCALAR_PH]]:
+; RV64-UF2-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[TMP16]], %[[MIDDLE_BLOCK]] ], [ [[TMP0]], %[[FOR_BODY_PREHEADER]] ], [ [[TMP0]], %[[VECTOR_SCEVCHECK]] ]
+; RV64-UF2-NEXT:    [[BC_RESUME_VAL6:%.*]] = phi i32 [ [[TMP17]], %[[MIDDLE_BLOCK]] ], [ [[N]], %[[FOR_BODY_PREHEADER]] ], [ [[N]], %[[VECTOR_SCEVCHECK]] ]
+; RV64-UF2-NEXT:    br label %[[FOR_BODY:.*]]
+; RV64-UF2:       [[FOR_COND_CLEANUP_LOOPEXIT]]:
+; RV64-UF2-NEXT:    br label %[[FOR_COND_CLEANUP]]
+; RV64-UF2:       [[FOR_COND_CLEANUP]]:
+; RV64-UF2-NEXT:    ret void
+; RV64-UF2:       [[FOR_BODY]]:
+; RV64-UF2-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
+; RV64-UF2-NEXT:    [[I_0_IN8:%.*]] = phi i32 [ [[BC_RESUME_VAL6]], %[[SCALAR_PH]] ], [ [[I_0:%.*]], %[[FOR_BODY]] ]
+; RV64-UF2-NEXT:    [[I_0]] = add nsw i32 [[I_0_IN8]], -1
+; RV64-UF2-NEXT:    [[IDXPROM:%.*]] = zext i32 [[I_0]] to i64
+; RV64-UF2-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[IDXPROM]]
+; RV64-UF2-NEXT:    [[TMP42:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
+; RV64-UF2-NEXT:    [[ADD9:%.*]] = add i32 [[TMP42]], 1
+; RV64-UF2-NEXT:    [[ARRAYIDX3:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[IDXPROM]]
+; RV64-UF2-NEXT:    store i32 [[ADD9]], ptr [[ARRAYIDX3]], align 4
+; RV64-UF2-NEXT:    [[CMP:%.*]] = icmp ugt i64 [[INDVARS_IV]], 1
+; RV64-UF2-NEXT:    [[INDVARS_IV_NEXT]] = add nsw i64 [[INDVARS_IV]], -1
+; RV64-UF2-NEXT:    br i1 [[CMP]], label %[[FOR_BODY]], label %[[FOR_COND_CLEANUP_LOOPEXIT]], !llvm.loop [[LOOP4:![0-9]+]]
+;
+entry:
+  %cmp7 = icmp sgt i32 %n, 0
+  br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup
+
+for.body.preheader:                               ; preds = %entry
+  %0 = zext i32 %n to i64
+  br label %for.body
+
+for.cond.cleanup:                                 ; preds = %for.body, %entry
+  ret void
+
+for.body:                                         ; preds = %for.body.preheader, %for.body
+  %indvars.iv = phi i64 [ %0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]
+  %i.0.in8 = phi i32 [ %n, %for.body.preheader ], [ %i.0, %for.body ]
+  %i.0 = add nsw i32 %i.0.in8, -1
+  %idxprom = zext i32 %i.0 to i64
+  %arrayidx = getelementptr inbounds i32, ptr %B, i64 %idxprom
+  %1 = load i32, ptr %arrayidx, align 4
+  %add9 = add i32 %1, 1
+  %arrayidx3 = getelementptr inbounds i32, ptr %A, i64 %idxprom
+  store i32 %add9, ptr %arrayidx3, align 4
+  %cmp = icmp ugt i64 %indvars.iv, 1
+  %indvars.iv.next = add nsw i64 %indvars.iv, -1
+  br i1 %cmp, label %for.body, label %for.cond.cleanup, !llvm.loop !0
+}
+
+define voi...
[truncated]

llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse-output.ll

lukel97 · 2025-03-12T09:07:27Z

llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse-output.ll

+; RV64-NEXT:    [[REVERSE:%.*]] = call <vscale x 4 x i32> @llvm.vector.reverse.nxv4i32(<vscale x 4 x i32> [[WIDE_LOAD]])
+; RV64-NEXT:    [[TMP14:%.*]] = add <vscale x 4 x i32> [[REVERSE]], splat (i32 1)
+; RV64-NEXT:    [[TMP15:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[TMP8]]
+; RV64-NEXT:    [[TMP16:%.*]] = mul i64 0, [[TMP5]]


@artagnon Looks like this is a multiply of zero generated by VPReverseVectorPointerRecipe, which reminded me of #127521. It's a bit different because it's happening at the IR level, not the VPlan level, but I wonder why the call to Builder.CreateMul isn't folding this away?

If it's happening purely at the IR level, the IRBuilder should constant-fold it. I suspect #125365 would fix this. EDIT: Sorry, both operands aren't constants, so how can constant-folder fold this? It requires something like InstCombine.

I thought ConstantExpr::getBinOpAbsorber could fold X * 0 -> 0? Maybe the operands need to be swapped around here if it only folds RHS constants?

Value *NumElt = Builder.CreateMul( ConstantInt::get(IndexTy, -(int64_t)CurrentPart), RunTimeVF);

Sorry I left this pending for so long: I was away on vacation. Yes, getBinOpAbsorber can fold either LHS const or RHS const: however, is it used in IRBuilder::CreateMul? I think it only calls a variant of FoldBinOp:

Value *FoldBinOp(Instruction::BinaryOps Opc, Value *LHS, Value *RHS) const override { auto *LC = dyn_cast<Constant>(LHS); auto *RC = dyn_cast<Constant>(RHS); if (LC && RC) { if (ConstantExpr::isDesirableBinOp(Opc)) return ConstantExpr::get(Opc, LC, RC); return ConstantFoldBinaryInstruction(Opc, LC, RC); } return nullptr; }

I think this is a real problem in LV, and I've seen variations of the problem it many other tests: I will try to attack it in follow-ups to my constant-folding patch.

Mel-Chen · 2025-03-17T09:30:34Z

ping

fhahn

LGTM, thanks.

Please update to current main and check if the test is passing before merging :)

llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse-output.ll

llvm-ci · 2025-03-18T08:10:53Z

LLVM Buildbot has detected a new failure on builder clang-cmake-x86_64-avx512-win running on avx512-intel64-win while building llvm at step 4 "cmake stage 1".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/81/builds/5355

Here is the relevant piece of the build log for the reference

Step 4 (cmake stage 1) failure: 'cmake -G ...' (failure)
'cmake' is not recognized as an internal or external command,
operable program or batch file.

Mel-Chen requested review from fhahn, LiqinWeng, topperc and david-arm March 10, 2025 09:11

llvmbot added backend:RISC-V llvm:transforms labels Mar 10, 2025

Mel-Chen requested a review from alexey-bataev March 10, 2025 09:11

fhahn reviewed Mar 10, 2025

View reviewed changes

Mel-Chen force-pushed the cm-stride-precommit branch from 4decdce to b1cce79 Compare March 11, 2025 13:09

Mel-Chen requested a review from fhahn March 11, 2025 13:12

lukel97 reviewed Mar 12, 2025

View reviewed changes

alexey-bataev approved these changes Mar 17, 2025

View reviewed changes

fhahn approved these changes Mar 17, 2025

View reviewed changes

llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse-output.ll Outdated Show resolved Hide resolved

Mel-Chen added 4 commits March 18, 2025 00:05

Pre-commit for cm_stride

60933f2

UF2 test

2c6b5bd

Refine test case

e9f9d38

Apply --check-globals=none

9d825d9

Mel-Chen force-pushed the cm-stride-precommit branch from b1cce79 to 9d825d9 Compare March 18, 2025 07:05

Mel-Chen merged commit 489d1e7 into llvm:main Mar 18, 2025
9 of 10 checks passed

Mel-Chen deleted the cm-stride-precommit branch March 18, 2025 08:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LV][NFC] Pre-commit test for supporting strided accesses. #130563

[LV][NFC] Pre-commit test for supporting strided accesses. #130563

Uh oh!

Mel-Chen commented Mar 10, 2025

Uh oh!

llvmbot commented Mar 10, 2025

Uh oh!

llvmbot commented Mar 10, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lukel97 Mar 12, 2025

Uh oh!

artagnon Mar 12, 2025 •

edited

Loading

Uh oh!

lukel97 Mar 12, 2025

Uh oh!

artagnon Apr 1, 2025 •

edited

Loading

Uh oh!

Mel-Chen commented Mar 17, 2025

Uh oh!

fhahn left a comment

Uh oh!

Uh oh!

Uh oh!

llvm-ci commented Mar 18, 2025

Uh oh!

Uh oh!

[LV][NFC] Pre-commit test for supporting strided accesses. #130563

[LV][NFC] Pre-commit test for supporting strided accesses. #130563

Uh oh!

Conversation

Mel-Chen commented Mar 10, 2025

Uh oh!

llvmbot commented Mar 10, 2025

Uh oh!

llvmbot commented Mar 10, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lukel97 Mar 12, 2025

Choose a reason for hiding this comment

Uh oh!

artagnon Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lukel97 Mar 12, 2025

Choose a reason for hiding this comment

Uh oh!

artagnon Apr 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Mel-Chen commented Mar 17, 2025

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

llvm-ci commented Mar 18, 2025

Uh oh!

Uh oh!

artagnon Mar 12, 2025 •

edited

Loading

artagnon Apr 1, 2025 •

edited

Loading