Skip to content

[pull] main from llvm:main #5556

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7,080 commits into
base: main
Choose a base branch
from
Open

[pull] main from llvm:main #5556

wants to merge 7,080 commits into from

Conversation

pull[bot]
Copy link

@pull pull bot commented Mar 27, 2025

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.1)

Can you help keep this open source service alive? 💖 Please sponsor : )

@pull pull bot added the ⤵️ pull label Mar 27, 2025
fhahn and others added 29 commits May 27, 2025 10:46
Make sure the new phis are inserted before any non-phi instructions.
This fixes a crash when dbg_value instructions are present in the
original exit block.
…4TruncSrlConstant

Let combinei64TruncSrlConstant decide when the fold is invalid instead of splitting so many of the conditions with combineTruncatedArithmetic

NOTE: We can probably relax the i32 truncation constraint to <= i32, perform the SRL as i32 and then truncate further.

Noticed while triaging #141496
Fixes various downstream bot failures ocurring with different default targets
e.g., windows due to mangling assumptions baked into the tests.
…inters to arrays (#141092)

Currently we generate an incorrect suggestion for shared/unique pointers
to arrays; for instance ([Godbolt](https://godbolt.org/z/Tens1reGP)):
```c++
#include <memory>

void test_shared_ptr_to_array() {
  std::shared_ptr<int[]> i;
  auto s = sizeof(*i.get());
}
```
```
<source>:5:20: warning: redundant get() call on smart pointer [readability-redundant-smartptr-get]
    5 |   auto s = sizeof(*i.get());
      |                    ^~~~~~~
      |                    i
1 warning generated.
```
`sizeof(*i)` is incorrect, though, because the array specialization of
`std::shared/unique_ptr` does not have an `operator*()`. Therefore I
have disabled this check for smart pointers to arrays for now; future
work could, of course, improve on this by suggesting, say,
`sizeof(i[0])` in the above example.
This patch is part of a stack that teaches Clang to generate Key Instructions
metadata for C and C++.

RFC:
https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668

The feature is only functional in LLVM if LLVM is built with CMake flag
LLVM_EXPERIMENTAL_KEY_INSTRUCTIONs. Eventually that flag will be removed.
This patch is part of a stack that teaches Clang to generate Key Instructions
metadata for C and C++.

RFC:
https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668

The feature is only functional in LLVM if LLVM is built with CMake flag
LLVM_EXPERIMENTAL_KEY_INSTRUCTIONs. Eventually that flag will be removed.
Adds support for operand promotion and splitting/widening the result
of the ISD::GET_ACTIVE_LANE_MASK node.
For AArch64, shouldExpandGetActiveLaneMask now returns false for more
types which we know can be legalised.
This patch is part of a stack that teaches Clang to generate Key Instructions
metadata for C and C++.

RFC:
https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668

The feature is only functional in LLVM if LLVM is built with CMake flag
LLVM_EXPERIMENTAL_KEY_INSTRUCTIONs. Eventually that flag will be removed.
…41546)

This refactor was motivated by two bugs identified in out-of-tree
builds:

1. Some implementations of the VisitMembersFunction type (often used to	
implement special loading semantics, e.g. -all_load or -ObjC) were assuming
that buffers for archive members were null-terminated, which they are not in
general. This was triggering occasional assertions.

2. Archives may include multiple members with the same file name, e.g.
when constructed by appending files with the same name:
  % llvm-ar crs libfoo.a foo.o
  % llvm-ar q libfoo.a foo.o
  % llvm-ar t libfoo.a foo.o
  foo.o

   While confusing, these members may be safe to link (provided that they're
   individually valid and don't define duplicate symbols). In ORC however, the
   archive member name may be used to construct an ORC initializer symbol,
   which must also be unique. In that case the duplicate member names lead to a
   duplicate definition error even if the members define unrelated symbols.

In addition to these bugs, StaticLibraryDefinitionGenerator had grown a
collection of all member buffers (ObjectFilesMap), a BumpPtrAllocator
that was redundantly storing synthesized archive member names (these are
copied into the MemoryBuffers created for each Object, but were never
freed in the allocator), and a set of COFF-specific import files.

To fix the bugs above and simplify StaticLibraryDefinitionGenerator this
patch makes the following changes:

1. StaticLibraryDefinitionGenerator::VisitMembersFunction is generalized
   to take a reference to the containing archive, and the index of the
   member within the archive. It now returns an Expected<bool> indicating
   whether the member visited should be treated as loadable, not loadable,
   or as invalidating the entire archive.
2. A static StaticLibraryDefinitionGenerator::createMemberBuffer method
   is added which creates MemoryBuffers with unique names of the form
   `<archive-name>[<index>](<member-name>)`. This defers construction of
   member names until they're loaded, allowing the BumpPtrAllocator (with
   its redundant name storage) to be removed.
3. The ObjectFilesMap (symbol name -> memory-buffer-ref) is replaced
   with a SymbolToMemberIndexMap (symbol name -> index) which should be
   smaller and faster to construct.
4. The 'loadability' result from VisitMemberFunctions is now taken into
   consideration when building the SymbolToMemberIndexMap so that members
   that have already been loaded / filtered out can be skipped, and do not
   take up any ongoing space.
5. The COFF ImportedDynamicLibraries member is moved out into the
   COFFImportFileScanner utility, which can be used as a
   VisitMemberFunction.

This fixes the bugs described above; and should lower memory consumption
slightly, especially for archives with many files and / or symbol where
most files are eventually loaded.
Same purpose as #141407,
comitting this directly to get the bot green sooner.

Co-authored-by: Ely Ronnen <[email protected]>
## Why
In
https://github.com/llvm/llvm-project/pull/113612/files#diff-ada12e18f3e902b41b6989b46455c4e32656276e59907026e2464cf57d10d583,
the parameter `qual_name` was introduced. However, the tests have not
been adopted accordingly and hence cannot be executed.

## What
Fix the execution of tests by providing the missing argument.
This patch is part of a stack that teaches Clang to generate Key Instructions
metadata for C and C++.

RFC:
https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668

The feature is only functional in LLVM if LLVM is built with CMake flag
LLVM_EXPERIMENTAL_KEY_INSTRUCTIONs. Eventually that flag will be removed.
It's already called in llvm_add_library.
This patch updates `CombineContractBroadcastMask` to inherit from
`MaskableOpRewritePattern`, enabling it to handle masked
`vector.contract` operations. The pattern rewrites:
```mlir
  %a = vector.broadcast %a_bc
  %res vector.contract %a_bc, %b, ...
```

into:
```mlir
  // Move the broadcast into vector.contract (by updating the indexing
  // maps)
  %res vector.contract %a, %b, ...
```

The main challenge is supporting cases where the pattern drops a leading
unit dimension. For example:
```mlir
func.func @contract_broadcast_unit_dim_reduction_masked(
    %arg0 : vector<8x4xi32>,
    %arg1 : vector<8x4xi32>,
    %arg2 : vector<8x8xi32>,
    %mask: vector<1x8x8x4xi1>) -> vector<8x8xi32> {

  %0 = vector.broadcast %arg0 : vector<8x4xi32> to vector<1x8x4xi32>
  %1 = vector.broadcast %arg1 : vector<8x4xi32> to vector<1x8x4xi32>
  %result = vector.mask %mask {
    vector.contract {
      indexing_maps = [#map0, #map1, #map2],
      iterator_types = ["reduction", "parallel", "parallel", "reduction"],
      kind = #vector.kind<add>
    } %0, %1, %arg2 : vector<1x8x4xi32>, vector<1x8x4xi32> into vector<8x8xi32>
  } : vector<1x8x8x4xi1> -> vector<8x8xi32>

  return %result : vector<8x8xi32>
}
```

Here, the leading unit dimension is dropped. To handle this, the mask is
cast to the correct shape using a `vector.shape_cast`:

```mlir
func.func @contract_broadcast_unit_dim_reduction_masked(
    %arg0: vector<8x4xi32>,
    %arg1: vector<8x4xi32>,
    %arg2: vector<8x8xi32>,
    %arg3: vector<1x8x8x4xi1>) -> vector<8x8xi32> {

  %mask_sc = vector.shape_cast %arg3 : vector<1x8x8x4xi1> to vector<8x8x4xi1>
  %res = vector.mask %mask_sc {
    vector.contract {
      indexing_maps = [#map, #map1, #map2],
      iterator_types = ["parallel", "parallel", "reduction"],
      kind = #vector.kind<add>
    } %arg0, %arg1, %mask_sc : vector<8x4xi32>, vector<8x4xi32> into vector<8x8xi32>
  } : vector<8x8x4xi1> -> vector<8x8xi32>

  return %res : vector<8x8xi32>
}
```

While this isn't ideal - since it introduces a `vector.shape_cast` that
must be cleaned up later - it reflects the best we can do once the input
reaches `CombineContractBroadcastMask`. A more robust solution may
involve simplifying the input earlier. I am leaving that as  a TODO for
myself to explore this further. Posting this now to unblock downstream
work.

LIMITATIONS

Currently, this pattern assumes:
* Only leading dimensions are dropped in the mask.
* All dropped dimensions must be unit-sized.
This patch is part of a stack that teaches Clang to generate Key Instructions
metadata for C and C++.

RFC:
https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668

The feature is only functional in LLVM if LLVM is built with CMake flag
LLVM_EXPERIMENTAL_KEY_INSTRUCTIONs. Eventually that flag will be removed.
This patch is part of a stack that teaches Clang to generate Key Instructions
metadata for C and C++.

RFC:
https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668

The feature is only functional in LLVM if LLVM is built with CMake flag
LLVM_EXPERIMENTAL_KEY_INSTRUCTIONs. Eventually that flag will be removed.
…138540)

When determining whether an escape source may alias with a noalias
argument, only take provenance captures into account. If only the
address of the argument was captured, an access through the escape
source is not legal.
Fixes errors about duplicate PHI edges when the input had duplicates
with constexprs in them. The constexpr translation makes new basic
blocks, causing the verifier to complain about duplicate entries in PHI
nodes.
…test (#141503)

This is a NFC change.

Added "-mattr=-real-true16" to a few gfx12 tests. This is for the up
coming GFX12 true16 code change. Set these tests to use fake16 flow
since true16 mode are not fully functional for GISEL
- No need to prefix `PointerType` with `llvm::`.
- Avoid namespace  block to define `PrintPipelinePasses`.
#140132)

Update initial construction to connect the Plan's entry to the scalar
preheader during initial construction. This moves a small part of the
 skeleton creation out of ILV and will also enable replacing
 VPInstruction::ResumePhi with regular VPPhi recipes.

Resume phis need 2 incoming values to start with, the second being the
bypass value from the scalar ph (and used to replicate the incoming
value for other bypass blocks). Adding the extra edge ensures we
incoming values for resume phis match the incoming blocks.

PR: #140132
asin, acos, atan, and atan2 were being lowered to libm calls instead of
llvm intrinsics. Add the conversion patterns to handle these intrinsics
and update tests to expect this.
… (NFC) (#141595)

The `GCNScheduleDAGMILive`'s `RescheduleRegions` bitvector is only used
by the rematerialization stage (`PreRARematStage`). Its presence in the
scheduler's state forces us to maintain its value throughout scheduling
even though it is of no use to the iterative scheduling process itself,
which instead relies on each stage's `initGCNRegion` hook to determine
whether the current region should be rescheduled.

This moves the bitvector to the `PreRARematStage`, which uses it to
store the set of regions that must be rescheduled between stage
initialization and region initialization.

This NFC also swaps a call to `GCNRegPressure::getArchVGPRNum(false)`
for a call to `GCNRegPressure::getArchVGPRNum()`---which is equivalent
but simpler in the context---and makes
`GCNSchedStage::finalizeGCNRegion` use its own API to advance to the
next region.
ZequanWu and others added 30 commits May 28, 2025 16:04
…add` command. (#138209)

Currently, the type `T`'s summary formatter will be matched for `T`,
`T*`, `T**` and so on. This is unexpected in many data formatters. Such
unhandled cases could cause the data formatter to crash. An example
would be the lldb's built-in data formatter for `std::optional`:
```
$ cat main.cpp
#include <optional>

int main() {
  std::optional<int> o_null;
  auto po_null = &o_null;
  auto ppo_null = &po_null;
  auto pppo_null = &ppo_null;
  return 0;
}
$ clang++ -g main.cpp && lldb -o "b 8" -o "r" -o "v pppo_null"
[lldb crash]
```

This change adds an options `--pointer-match-depth` to `type summary
add` command to allow users to specify how many layer of pointers can be
dereferenced at most when matching a summary formatter of type `T`, as
Jim suggested
[here](#124048).
By default, this option has value 1 which means summary formatter for
`T` could also be used for `T*` but not `T**` nor beyond. This option is
no-op when `--skip-pointers` is set as well.

I didn't add such option for `type synthetic add`, `type format add`,
`type filter add`, because it useful for those command. Instead, they
all have the pointer match depth of 1. When printing a type `T*`, lldb
never print the children of `T` even if there is a synthetic formatter
registered for `T`.
This patch adds support for a basic MemProf summary section, which is
built along with the indexed MemProf profile (e.g. when reading the raw
or YAML profiles), and serialized through the indexed profile just after
the header.

Currently only 6 fields are written, specifically the number of contexts
(total, cold, hot), and the max context size (cold, warm, hot).

To support forwards and backwards compatibility for added fields in the
indexed profile, the number of fields serialized first. The code is
written to support forwards compatibility (reading newer profiles with
additional summary fields), and comments indicate how to implement
backwards compatibility (reading older profiles with fewer summary
fields) as needed.

Support is added to print the summary as YAML comments when displaying
both the raw and indexed profiles via `llvm-profdata show`. Because they
are YAML comments, the YAML reader ignores these (the summary is always
recomputed when building the indexed profile as described above).

This necessitated moving some options and a couple of interfaces out of
Analysis/MemoryProfileInfo.cpp and into the new
ProfileData/MemProfSummary.cpp file, as we need to classify context
hotness earlier and also compute context ids to build the summary from
older indexed profiles.
Now that there is only a single AnyOf recurrence kind, simply pass the
start value instead of the full recurrence descriptor, to tighten the
interface.
…rning (#141790)

#140762 introduces some
compilation warnings in `lldb/unittests/Core/MangledTest.cpp`. This
patch adds explicit default initialization to `DemangledNameInfo` to
suppress those warnings.

We only had the default initialization values to `PrefixRange` and
`SuffixRange` because they are the only _optional_ fields of the
structure.
This change adds code to defer emitting declarations and tentative
definitions until they are referenced or trigger by a call to
CompleteTentativeDefinition. This is needed to avoid premature handling
of declarations and definitions that might not be referenced in the
current translation unit. It also avoids incorrectly adding an
initializer to external declarations.

This change also updates the way the insertion location for globals is
chosen so that all globals will be emitted together at the top of the
module. This makes no functional difference, but it is very useful for
writing sensible tests.

Some tests are modified in this change to reorder global variables so
that they can be checked in the order in which they will be emitted.
#140624)

FORMAT("J=",I3) is accepted by a few other Fortran compilers as a valid
format for input as well as for output. The character string edit
descriptor "J=" is interpreted as if it had been 2X on input, causing
two characters to be skipped over. The skipped characters don't have to
match the characters in the literal string. An optional warning is
emitted under control of the -pedantic option.
When a TYPE(*) dummy argument is erroneously used as a component value
in a structure constructor, semantics crashes if the structure
constructor had been initially parsed as a potential function reference.
Clean out stale typed expressions when reanalyzing the reconstructed
parse subtree to ensure that errors are caught the next time around.
Now that there is only a single FindLastIV recurrence kind, simply pass
the sentinel value instead of the full recurrence descriptor to tighten
the interface.
The number of copies on the new dimension must be clamped via MAX(0,
ncopies) so that it is no less than zero.

Fixes #141119.
A dummy argument with an explicit INTEGER type of non-default kind can
be forward-referenced from a specification expression in many Fortran
compilers. Handle by adding type declaration statements to the initial
pass over a specification part's declaration constructs. Emit an
optional warning under -pedantic.

Fixes #140941.
)

When processing free form source line continuation, the prescanner
treats empty keyword macros as if they were spaces or tabs. After
skipping over them, however, there's code that only works if the skipped
characters ended with an actual space or tab. If the last skipped item
was an empty keyword macro's name, the last character of that name would
end up being the first character of the continuation line. Fix.
…switch-case (#141779)

To make it more clear that it's a subset of -Wdeprecated-declarations.

Follow-up to #138562
This change uses resource name during DXIL resource binding analysis to detect when two (or more) resources have identical overlapping binding.

The DXIL resource analysis just detects that there is a problem with the binding and sets the `hasOverlappingBinding` flag. Full error reporting will happen later in DXILPostOptimizationValidation pass (#110723).
implemented wcschr and tests

---------

Co-authored-by: Sriya Pratipati <[email protected]>
Ensure everything is defined inside the namespace, reduce number of
ifdefs.
…1814)

These tests will track progress on extending
#139809 from CFI to more UBSan
checks.
…41719)

Currently, to avoid generating too much BTF types, for a struct type
like
```
  struct foo {
    int val;
    struct bar *ptr;
  };
```
if the BTF generation reaches 'struct foo', it will not generate actual
type for 'struct bar' and instead a forward decl is generated. The
'struct bar' is actual generated in BTF unless it is reached through a
non-struct pointer member.

Such a limitation forces bpf developer to hack and workaround this
problem. See [1] and [2]. For example in [1], we have
```
    struct map_value {
	struct prog_test_ref_kfunc *not_kptr;
	struct prog_test_ref_kfunc __kptr *val;
	struct node_data __kptr *node;
    };
```
The BTF type for 'struct node_data' is not generated. Note that we have
a '__kptr' annotation. Similar problem for [2] with a '__uptr'
annotation. Note that the issue in [1] has been resolved later but the
hack in [2] is still needed.

This patch relaxed the struct type (with struct pointer member) BTF
generation if the struct pointer has a btf_type_tag annotation.

[1] https://lore.kernel.org/r/[email protected]
[2] https://lore.kernel.org/r/[email protected]
Tests using ObjC do not readily run on Linux.
Use the if statement with an initializer pattern that's very common in
LLVM in SBTarget. Every time someone adds a new method to SBTarget, I
want to encourage using this pattern, but I don't because it would be
inconsistent with the rest of the file. This solves that problem by
switching over the whole file.
…perands (#141845)

As noted in
#141821 (comment),
whilst we currently constant fold intrinsics of fixed-length vectors via
their scalar counterpart, we don't do the same for scalable vectors.

This handles the scalable vector case when the operands are splats.

One weird snag in ConstantVector::getSplat was that it produced a undef
if passed in poison, so this also contains a fix by checking for
PoisonValue before UndefValue.
Fixes the final reduction steps which were taken from an implementation
of scan, not reduction, causing lanes earlier in the wave to have
incorrect results due to masking.

Now aligning more closely with triton implementation :
triton-lang/triton#5019

# Hypothetical example
To provide an explanation of the issue with the current implementation,
let's take the simple example of attempting to perform a sum over 64
lanes where the initial values are as follows (first lane has value 1,
and all other lanes have value 0):
```
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
```
When performing a sum reduction over these 64 lanes, in the current
implementation we perform 6 dpp instructions which in sequential order
do the following:
1) sum over clusters of 2 contiguous lanes
2) sum over clusters of 4 contiguous lanes
3) sum over clusters of 8 contiguous lanes
4) sum over an entire row
5) broadcast the result of last lane in each row to the next row and
each lane sums current value with incoming value.
5) broadcast the result of the 32nd lane to last two rows and each lane
sums current value with incoming value.

After step 4) the result for the example above looks like this:

```
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
```

After step 5) the result looks like this:
```
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
```

After step 6) the result looks like this:
```
[4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
```
Note that the correct value here is always 1, yet after the
`dpp.broadcast` ops some lanes have incorrect values. The reason is that
for these incorrect lanes, like lanes 0-15 in step 5, the
`dpp.broadcast` op doesn't provide them incoming values from other
lanes. Instead these lanes are provided either their own values, or 0
(depending on whether `bound_ctrl` is true or false) as values to sum
over, either way these values are stale and these lanes shouldn't be
used in general.

So what this means:
- For a subgroup reduce over 32 lanes (like Step 5), the correct result
is stored in lanes 16 to 31
- For a subgroup reduce over 64 lanes (like Step 6), the correct result
is stored in lanes 32 to 63.

However in the current implementation we do not specifically read the
value from one of the correct lanes when returning a final value. In
some workloads it seems without this specification, the stale value from
the first lane is returned instead.

# Actual failing test
For a specific example of how the current implementation causes issues,
take a look at the IR below which represents an additive reduction over
a dynamic dimension.
```
!matA = tensor<1x?xf16>
!matB = tensor<1xf16>
#map = affine_map<(d0, d1) -> (d0, d1)>
#map1 = affine_map<(d0, d1) -> (d0)>
func.func @only_producer_fusion_multiple_result(%arg0: !matA) -> !matB {
  %cst_1 = arith.constant 0.000000e+00 : f16
  %c2_i64 = arith.constant 2 : i64
  %0 = tensor.empty() : !matB
  %2 = linalg.fill ins(%cst_1 : f16) outs(%0 : !matB) -> !matB
  %4 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["parallel", "reduction"]} ins(%arg0 : !matA) outs(%2 : !matB)  {
  ^bb0(%in: f16, %out: f16):
    %7 = arith.addf %in, %out : f16
    linalg.yield %7 : f16
  } -> !matB
  return %4 : !matB
}
```
When provided an input of type `tensor<1x2xf16>` and values `{0, 1}` to
perform the reduction over, the value returned is consistently 4. By the
same analysis done above, this shows that the returned value is coming
from one of these stale lanes and needs to be read instead from one of
the lanes storing the correct result.

Signed-off-by: Muzammiluddin Syed <[email protected]>
…ool (#140829)

This consolidates some of the error handling around regex arguments to
the tool, and sets up the APIs such that errors must be handled before
their usage.
…141859)

The context disambiguation code already emits remarks when hinting
allocations (by adding hotness attributes) during cloning. However,
we did not yet emit hints when applying the hotness attributes during
building of the metadata (during matching and again after inlining).
Add remarks when we apply the hint attributes for these
non-context-sensitive allocations.
This change updates a few tests for global variable handling to also
check classic codegen output so we can easily verify consistency between
the two and will be alerted if the classic codegen changes.

This was useful in developing forthcoming changes to global linkage
handling.
While and->cmp->sel combines into and->mul may result in worse code on
some targets, this combine should be uniformly beneficial.

Proof: https://alive2.llvm.org/ce/z/MibAcN

---------

Co-authored-by: Matt Arsenault <[email protected]>
Co-authored-by: Yingwei Zheng <[email protected]>
This helps to disambiguate accesses in the caller and the callee
after LLVM inlining in some apps. I did not see any performance
changes, but this is one step towards enabling other optimizations
in the apps that I am looking at.

The definition of llvm.noalias says:
```
... indicates that memory locations accessed via pointer values based on the argument or return value are not also accessed, during the execution of the function, via pointer values not based on the argument or return value. This guarantee only holds for memory locations that are modified, by any means, during the execution of the function.
```

I believe this exactly matches Fortran rules for the dummy arguments
that are modified during their subprogram execution.

I also set llvm.noalias and llvm.nocapture on the !fir.box<> arguments,
because the corresponding descriptors cannot be captured and cannot
alias anything (not based on them) during the execution of the
subprogram.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment