Skip to content

[pull] main from llvm:main #5556

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7,090 commits into
base: main
Choose a base branch
from
Open

[pull] main from llvm:main #5556

wants to merge 7,090 commits into from

Conversation

pull[bot]
Copy link

@pull pull bot commented Mar 27, 2025

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.1)

Can you help keep this open source service alive? 💖 Please sponsor : )

@pull pull bot added the ⤵️ pull label Mar 27, 2025
llvmgnsyncbot and others added 29 commits May 27, 2025 10:59
Same purpose as #141407,
comitting this directly to get the bot green sooner.

Co-authored-by: Ely Ronnen <[email protected]>
## Why
In
https://github.com/llvm/llvm-project/pull/113612/files#diff-ada12e18f3e902b41b6989b46455c4e32656276e59907026e2464cf57d10d583,
the parameter `qual_name` was introduced. However, the tests have not
been adopted accordingly and hence cannot be executed.

## What
Fix the execution of tests by providing the missing argument.
This patch is part of a stack that teaches Clang to generate Key Instructions
metadata for C and C++.

RFC:
https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668

The feature is only functional in LLVM if LLVM is built with CMake flag
LLVM_EXPERIMENTAL_KEY_INSTRUCTIONs. Eventually that flag will be removed.
It's already called in llvm_add_library.
This patch updates `CombineContractBroadcastMask` to inherit from
`MaskableOpRewritePattern`, enabling it to handle masked
`vector.contract` operations. The pattern rewrites:
```mlir
  %a = vector.broadcast %a_bc
  %res vector.contract %a_bc, %b, ...
```

into:
```mlir
  // Move the broadcast into vector.contract (by updating the indexing
  // maps)
  %res vector.contract %a, %b, ...
```

The main challenge is supporting cases where the pattern drops a leading
unit dimension. For example:
```mlir
func.func @contract_broadcast_unit_dim_reduction_masked(
    %arg0 : vector<8x4xi32>,
    %arg1 : vector<8x4xi32>,
    %arg2 : vector<8x8xi32>,
    %mask: vector<1x8x8x4xi1>) -> vector<8x8xi32> {

  %0 = vector.broadcast %arg0 : vector<8x4xi32> to vector<1x8x4xi32>
  %1 = vector.broadcast %arg1 : vector<8x4xi32> to vector<1x8x4xi32>
  %result = vector.mask %mask {
    vector.contract {
      indexing_maps = [#map0, #map1, #map2],
      iterator_types = ["reduction", "parallel", "parallel", "reduction"],
      kind = #vector.kind<add>
    } %0, %1, %arg2 : vector<1x8x4xi32>, vector<1x8x4xi32> into vector<8x8xi32>
  } : vector<1x8x8x4xi1> -> vector<8x8xi32>

  return %result : vector<8x8xi32>
}
```

Here, the leading unit dimension is dropped. To handle this, the mask is
cast to the correct shape using a `vector.shape_cast`:

```mlir
func.func @contract_broadcast_unit_dim_reduction_masked(
    %arg0: vector<8x4xi32>,
    %arg1: vector<8x4xi32>,
    %arg2: vector<8x8xi32>,
    %arg3: vector<1x8x8x4xi1>) -> vector<8x8xi32> {

  %mask_sc = vector.shape_cast %arg3 : vector<1x8x8x4xi1> to vector<8x8x4xi1>
  %res = vector.mask %mask_sc {
    vector.contract {
      indexing_maps = [#map, #map1, #map2],
      iterator_types = ["parallel", "parallel", "reduction"],
      kind = #vector.kind<add>
    } %arg0, %arg1, %mask_sc : vector<8x4xi32>, vector<8x4xi32> into vector<8x8xi32>
  } : vector<8x8x4xi1> -> vector<8x8xi32>

  return %res : vector<8x8xi32>
}
```

While this isn't ideal - since it introduces a `vector.shape_cast` that
must be cleaned up later - it reflects the best we can do once the input
reaches `CombineContractBroadcastMask`. A more robust solution may
involve simplifying the input earlier. I am leaving that as  a TODO for
myself to explore this further. Posting this now to unblock downstream
work.

LIMITATIONS

Currently, this pattern assumes:
* Only leading dimensions are dropped in the mask.
* All dropped dimensions must be unit-sized.
This patch is part of a stack that teaches Clang to generate Key Instructions
metadata for C and C++.

RFC:
https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668

The feature is only functional in LLVM if LLVM is built with CMake flag
LLVM_EXPERIMENTAL_KEY_INSTRUCTIONs. Eventually that flag will be removed.
This patch is part of a stack that teaches Clang to generate Key Instructions
metadata for C and C++.

RFC:
https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668

The feature is only functional in LLVM if LLVM is built with CMake flag
LLVM_EXPERIMENTAL_KEY_INSTRUCTIONs. Eventually that flag will be removed.
…138540)

When determining whether an escape source may alias with a noalias
argument, only take provenance captures into account. If only the
address of the argument was captured, an access through the escape
source is not legal.
Fixes errors about duplicate PHI edges when the input had duplicates
with constexprs in them. The constexpr translation makes new basic
blocks, causing the verifier to complain about duplicate entries in PHI
nodes.
…test (#141503)

This is a NFC change.

Added "-mattr=-real-true16" to a few gfx12 tests. This is for the up
coming GFX12 true16 code change. Set these tests to use fake16 flow
since true16 mode are not fully functional for GISEL
- No need to prefix `PointerType` with `llvm::`.
- Avoid namespace  block to define `PrintPipelinePasses`.
#140132)

Update initial construction to connect the Plan's entry to the scalar
preheader during initial construction. This moves a small part of the
 skeleton creation out of ILV and will also enable replacing
 VPInstruction::ResumePhi with regular VPPhi recipes.

Resume phis need 2 incoming values to start with, the second being the
bypass value from the scalar ph (and used to replicate the incoming
value for other bypass blocks). Adding the extra edge ensures we
incoming values for resume phis match the incoming blocks.

PR: #140132
asin, acos, atan, and atan2 were being lowered to libm calls instead of
llvm intrinsics. Add the conversion patterns to handle these intrinsics
and update tests to expect this.
… (NFC) (#141595)

The `GCNScheduleDAGMILive`'s `RescheduleRegions` bitvector is only used
by the rematerialization stage (`PreRARematStage`). Its presence in the
scheduler's state forces us to maintain its value throughout scheduling
even though it is of no use to the iterative scheduling process itself,
which instead relies on each stage's `initGCNRegion` hook to determine
whether the current region should be rescheduled.

This moves the bitvector to the `PreRARematStage`, which uses it to
store the set of regions that must be rescheduled between stage
initialization and region initialization.

This NFC also swaps a call to `GCNRegPressure::getArchVGPRNum(false)`
for a call to `GCNRegPressure::getArchVGPRNum()`---which is equivalent
but simpler in the context---and makes
`GCNSchedStage::finalizeGCNRegion` use its own API to advance to the
next region.
…movl x), y) (#141579)

Move VZEXT_MOVL nodes up through shift nodes.

We should be trying harder to move VZEXT_MOVL towards any associated SCALAR_TO_VECTOR nodes to make use of MOVD/Q implicit zeroing of upper elements.

Fixes #141475
In `applyAdrpAddLdr()` we make a transformation that is identical to the
one in `applyAdrpAdd()`, so lets reuse that code. Also refactor
`forEachHint()` to use more `ArrayRef` and move around some lines for
consistancy.
This addresses a TODO where previously scalarizeBinopOrCmp
conservatively bailed if one of the operands was a load.

getVectorInstrCost was updated to take in values in
https://reviews.llvm.org/D140498 so we can pass in the scalar value to
be inserted, which should return an accurate cost for a gather.

To prevent regressions on x86 this tries to constant fold NewVecC up
front so we can pass it into TTI and get a more accurate cost.

We want to remove this restriction on RISC-V since this is always
profitable whether or not the scalar is a load.
We were emitting a non-error diagnostic but claiming we emitted an
error, which caused some obvious follow-on problems.

Fixes #141549
)

Not explicitly defining the default case for ShallowCopy* functions does
not meet the requirements for gcc to actually instantiate the templates,
leading to build errors that show up with gcc but not with clang.

Signed-off-by: Kajetan Puchalski <[email protected]>
This patch moves scalable vectorization tests into an existing generic
vectorization test file:
  * vectorization-scalable.mlir --> merged into vectorization.mlir

Rationale:
  * Most tests in vectorization-scalable.mlir are variants of existing
    tests in vectorization.mlir. Keeping them together improves
    maintainability.
  * Consolidating tests makes it easier to spot gaps in coverage for
    regular vectorization.
  * In the Vector dialect, we don't separate tests for scalable vectors;
    this change aligns Linalg with that convention.

Notable changes beyond moving tests:
  * Updated one of the two matrix-vector multiplication tests to use
    `linalg.matvec` instead of `linalg.generic`. CHECK lines remain
    unchanged.
  * Simplified the lone `linalg.index` test by removing an unnecessary
    `tensor.extract`. Also removed canonicalization patterns from the
    TD sequence for consistency with other tests.

This patch contributes to the implementation of #141025 — please refer
to that ticket for full context.
This implements the design proposed by [Representing SpirvType in
Clang's Type System](llvm/wg-hlsl#181). It
creates `HLSLInlineSpirvType` as a new `Type` subclass, and
`__hlsl_spirv_type` as a new builtin type template to create such a
type.

This new type is lowered to the `spirv.Type` target extension type, as
described in [Target Extension Types for Inline SPIR-V and Decorated
Types](https://github.com/llvm/wg-hlsl/blob/main/proposals/0017-inline-spirv-and-decorated-types.md).
#139858)

New contributors can just indicate that they are working on the issue
without requesting assignment.

That shouldd reduce the burden of assigned issues that are not actually
being worked on, and new contributors waiting for a maintainer to
asssign them the issue.

---------

Co-authored-by: Danny Mösch <[email protected]>
This patch fixes:

  clang/utils/TableGen/ClangBuiltinTemplatesEmitter.cpp:25:8: error:
  'llvm::StringSet' may not intend to support class template argument
  deduction [-Werror,-Wctad-maybe-unsupported]
klausler and others added 30 commits May 28, 2025 14:01
A dummy argument with an explicit INTEGER type of non-default kind can
be forward-referenced from a specification expression in many Fortran
compilers. Handle by adding type declaration statements to the initial
pass over a specification part's declaration constructs. Emit an
optional warning under -pedantic.

Fixes #140941.
)

When processing free form source line continuation, the prescanner
treats empty keyword macros as if they were spaces or tabs. After
skipping over them, however, there's code that only works if the skipped
characters ended with an actual space or tab. If the last skipped item
was an empty keyword macro's name, the last character of that name would
end up being the first character of the continuation line. Fix.
…switch-case (#141779)

To make it more clear that it's a subset of -Wdeprecated-declarations.

Follow-up to #138562
This change uses resource name during DXIL resource binding analysis to detect when two (or more) resources have identical overlapping binding.

The DXIL resource analysis just detects that there is a problem with the binding and sets the `hasOverlappingBinding` flag. Full error reporting will happen later in DXILPostOptimizationValidation pass (#110723).
implemented wcschr and tests

---------

Co-authored-by: Sriya Pratipati <[email protected]>
Ensure everything is defined inside the namespace, reduce number of
ifdefs.
…1814)

These tests will track progress on extending
#139809 from CFI to more UBSan
checks.
…41719)

Currently, to avoid generating too much BTF types, for a struct type
like
```
  struct foo {
    int val;
    struct bar *ptr;
  };
```
if the BTF generation reaches 'struct foo', it will not generate actual
type for 'struct bar' and instead a forward decl is generated. The
'struct bar' is actual generated in BTF unless it is reached through a
non-struct pointer member.

Such a limitation forces bpf developer to hack and workaround this
problem. See [1] and [2]. For example in [1], we have
```
    struct map_value {
	struct prog_test_ref_kfunc *not_kptr;
	struct prog_test_ref_kfunc __kptr *val;
	struct node_data __kptr *node;
    };
```
The BTF type for 'struct node_data' is not generated. Note that we have
a '__kptr' annotation. Similar problem for [2] with a '__uptr'
annotation. Note that the issue in [1] has been resolved later but the
hack in [2] is still needed.

This patch relaxed the struct type (with struct pointer member) BTF
generation if the struct pointer has a btf_type_tag annotation.

[1] https://lore.kernel.org/r/[email protected]
[2] https://lore.kernel.org/r/[email protected]
Tests using ObjC do not readily run on Linux.
Use the if statement with an initializer pattern that's very common in
LLVM in SBTarget. Every time someone adds a new method to SBTarget, I
want to encourage using this pattern, but I don't because it would be
inconsistent with the rest of the file. This solves that problem by
switching over the whole file.
…perands (#141845)

As noted in
#141821 (comment),
whilst we currently constant fold intrinsics of fixed-length vectors via
their scalar counterpart, we don't do the same for scalable vectors.

This handles the scalable vector case when the operands are splats.

One weird snag in ConstantVector::getSplat was that it produced a undef
if passed in poison, so this also contains a fix by checking for
PoisonValue before UndefValue.
Fixes the final reduction steps which were taken from an implementation
of scan, not reduction, causing lanes earlier in the wave to have
incorrect results due to masking.

Now aligning more closely with triton implementation :
triton-lang/triton#5019

# Hypothetical example
To provide an explanation of the issue with the current implementation,
let's take the simple example of attempting to perform a sum over 64
lanes where the initial values are as follows (first lane has value 1,
and all other lanes have value 0):
```
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
```
When performing a sum reduction over these 64 lanes, in the current
implementation we perform 6 dpp instructions which in sequential order
do the following:
1) sum over clusters of 2 contiguous lanes
2) sum over clusters of 4 contiguous lanes
3) sum over clusters of 8 contiguous lanes
4) sum over an entire row
5) broadcast the result of last lane in each row to the next row and
each lane sums current value with incoming value.
5) broadcast the result of the 32nd lane to last two rows and each lane
sums current value with incoming value.

After step 4) the result for the example above looks like this:

```
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
```

After step 5) the result looks like this:
```
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
```

After step 6) the result looks like this:
```
[4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
```
Note that the correct value here is always 1, yet after the
`dpp.broadcast` ops some lanes have incorrect values. The reason is that
for these incorrect lanes, like lanes 0-15 in step 5, the
`dpp.broadcast` op doesn't provide them incoming values from other
lanes. Instead these lanes are provided either their own values, or 0
(depending on whether `bound_ctrl` is true or false) as values to sum
over, either way these values are stale and these lanes shouldn't be
used in general.

So what this means:
- For a subgroup reduce over 32 lanes (like Step 5), the correct result
is stored in lanes 16 to 31
- For a subgroup reduce over 64 lanes (like Step 6), the correct result
is stored in lanes 32 to 63.

However in the current implementation we do not specifically read the
value from one of the correct lanes when returning a final value. In
some workloads it seems without this specification, the stale value from
the first lane is returned instead.

# Actual failing test
For a specific example of how the current implementation causes issues,
take a look at the IR below which represents an additive reduction over
a dynamic dimension.
```
!matA = tensor<1x?xf16>
!matB = tensor<1xf16>
#map = affine_map<(d0, d1) -> (d0, d1)>
#map1 = affine_map<(d0, d1) -> (d0)>
func.func @only_producer_fusion_multiple_result(%arg0: !matA) -> !matB {
  %cst_1 = arith.constant 0.000000e+00 : f16
  %c2_i64 = arith.constant 2 : i64
  %0 = tensor.empty() : !matB
  %2 = linalg.fill ins(%cst_1 : f16) outs(%0 : !matB) -> !matB
  %4 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["parallel", "reduction"]} ins(%arg0 : !matA) outs(%2 : !matB)  {
  ^bb0(%in: f16, %out: f16):
    %7 = arith.addf %in, %out : f16
    linalg.yield %7 : f16
  } -> !matB
  return %4 : !matB
}
```
When provided an input of type `tensor<1x2xf16>` and values `{0, 1}` to
perform the reduction over, the value returned is consistently 4. By the
same analysis done above, this shows that the returned value is coming
from one of these stale lanes and needs to be read instead from one of
the lanes storing the correct result.

Signed-off-by: Muzammiluddin Syed <[email protected]>
…ool (#140829)

This consolidates some of the error handling around regex arguments to
the tool, and sets up the APIs such that errors must be handled before
their usage.
…141859)

The context disambiguation code already emits remarks when hinting
allocations (by adding hotness attributes) during cloning. However,
we did not yet emit hints when applying the hotness attributes during
building of the metadata (during matching and again after inlining).
Add remarks when we apply the hint attributes for these
non-context-sensitive allocations.
This change updates a few tests for global variable handling to also
check classic codegen output so we can easily verify consistency between
the two and will be alerted if the classic codegen changes.

This was useful in developing forthcoming changes to global linkage
handling.
While and->cmp->sel combines into and->mul may result in worse code on
some targets, this combine should be uniformly beneficial.

Proof: https://alive2.llvm.org/ce/z/MibAcN

---------

Co-authored-by: Matt Arsenault <[email protected]>
Co-authored-by: Yingwei Zheng <[email protected]>
This helps to disambiguate accesses in the caller and the callee
after LLVM inlining in some apps. I did not see any performance
changes, but this is one step towards enabling other optimizations
in the apps that I am looking at.

The definition of llvm.noalias says:
```
... indicates that memory locations accessed via pointer values based on the argument or return value are not also accessed, during the execution of the function, via pointer values not based on the argument or return value. This guarantee only holds for memory locations that are modified, by any means, during the execution of the function.
```

I believe this exactly matches Fortran rules for the dummy arguments
that are modified during their subprogram execution.

I also set llvm.noalias and llvm.nocapture on the !fir.box<> arguments,
because the corresponding descriptors cannot be captured and cannot
alias anything (not based on them) during the execution of the
subprogram.
llvm-diff shows no change to amdgcn--amdhsa.bc
When we're creating a Zilsd store from a split i64 value, check if both
inputs are 0 and use X0_Pair instead of a REG_SEQUENCE.
- Attach the mappable interface to ClassType
- Remove the TODO since derived type in descriptor can be handled as
other descriptors
The current unwinding implementation on Haiku is messy and broken.
1. It searches weird paths for private headers, which is breaking builds
in consuming projects, such as dotnet/runtime.
2. It does not even work, due to relying on incorrect private offsets.

This commit strips all references to private headers and ports a working
signal frame implementation. It has been tested against
`tests/signal_unwind.pass.cpp` and can go pass the signal frame.
Summary:
When there are overloaded C++ operators in the global namespace the AST
node for these is not a `BinaryExpr` but a `CXXOperatorCallExpr`. Modify
the uses to handle this case, basically just treating it as a binary
expression with two arguments.

Fixes #141085
The XAndesPerf extension includes unsigned bitfield extraction
instruction `NDS.BFOZ`, which can extract the bits from LSB to MSB, 
places them starting at bit 0, and zero-extends the result.

The testcase includes the three patterns that can be selected as
unsigned bitfield extracts: `and`, `and+lshr` and `lshr+and`
…eduction and VPMulAccumulateReduction. (#113903)

This patch implement the VPlan-based cost model for VPReduction,
VPExtendedReduction and VPMulAccumulateReduction.

With this patch, we can calculate the reduction cost by the VPlan-based
cost model so remove the reduction costs in `precomputeCost()`.

Ref: Original instruction based implementation:
https://reviews.llvm.org/D93476
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment