Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoBump] Merge with fixes of 95c2d798 (Oct 30) (6) #463

Merged
merged 290 commits into from
Feb 13, 2025

Conversation

jorickert
Copy link

No description provided.

sdkrystian and others added 30 commits October 30, 2024 14:50
…vm#114258)

This patch fixes a couple of regressions introduced in llvm#111852.

Consider:

```
template<typename T>
struct A
{
    template<bool U>
    static constexpr bool f() requires U
    {
        return true;
    }
};

template<>
template<bool U>
constexpr bool A<short>::f() requires U
{
    return A<long>::f<U>();
}

template<>
template<bool U>
constexpr bool A<long>::f() requires U
{
    return true;
}

static_assert(A<short>::f<true>()); // crash here
```

This crashes because when collecting template arguments from the _first_
declaration of `A<long>::f<true>` for constraint checking, we don't add
the template arguments from the enclosing class template specialization
because there exists another redeclaration that is a member
specialization.

This also fixes the following example, which happens for a similar
reason:
```
// input.cppm

export module input;

export template<int N>
constexpr int f();

template<int N>
struct A {
  template<int J>
  friend constexpr int f();
};

template struct A<0>;

template<int N>
constexpr int f() {
  return N;
}
```

```
// input.cpp

import input;

static_assert(f<1>() == 1); // error: static assertion failed
```
…#114274)

Add Uses = [FRM] to the underlying MC instructions.
    
Tweak a couple test cases so the MachineVerifier would have caught this.
Credits: llvm#111419

Fixes icmp-flags.mir

First attempt: llvm#113090

Revert: llvm#114256
… `isInTemplateInstantiation` matchers (llvm#110666)

Fix `isInstantiated` and `isInTemplateInstantiation` matchers, so they
return true for instantiations of variable templates, and any
declaration in statements contained in such instantiations.
This commit adds the ability to get a particular resource from an array
of resources using the handle_fromBinding intrinsic.

The main changes are:

1. Create an array when generating the type.
2. Add capabilities from

[SPV_EXT_descriptor_indexing](https://htmlpreview.github.io/?https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/EXT/SPV_EXT_descriptor_indexing.html).

We are still missing the ability to declare a runtime array. That will
be done in a follow up PR.
When running Bionic's testsuite over llvm-libc, tests broke because
e.g.,

```
const char *str = "abc";
char buf[7]{"111111"};
strlcpy(buf, str, 7);
ASSERT_EQ(buf, {'1', '1', '1', '\0', '\0', '\0', '\0'});
```

On my machine (Debian w/ glibc and clang-16), a `printf` loop over `buf`
gets unrolled into a series of const `printf` at compile-time:
```
printf("%d\n", '1');
printf("%d\n", '1');
printf("%d\n", '1');
printf("%d\n", 0);
printf("%d\n", '1');
printf("%d\n", '1');
printf("%d\n", 0);
```

Seems best to match existing precedent here.
This fixes the build after the removal of the clang-format status page.
…lvm#114216)

This PR implements instruction selection for G_BITCAST on an earlier
stage to avoid MachineVerifier complains on subtle semantics difference
between G_BITCAST and OpBitcast.

We do instruction selections for OpBitcast after IR Translation instead
of calling MIB.buildBitcast() generating the general op code G_BITCAST,
because when MachineVerifier validates G_BITCAST we see a check of a
kind: 'if Source Type is equal to Destination Type then report error
"bitcast must change the type"'. This doesn't take into account the
notion of a typed pointer that is important for SPIR-V where a user may
and should use bitcast between pointers with different pointee types
(https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpBitcast).

It's important for correct lowering in SPIR-V, because interpretation of
the data type is not left to instructions that utilize the pointer, but
encoded by the pointer declaration, and the SPIRV target can and must
handle the declaration and use of pointers that specify the type of data
they point to.

It's not feasible to improve validation of G_BITCAST using just
information provided by low level types of source and destination.
Therefore we don't produce G_BITCAST as the general op code with
semantics different from OpBitcast, but rather lower to OpBitcast
immediately.

See discussion in llvm#110270 for
even more context.
`v2*16` is a legal type in NVPTX. Thus, this is dead code.
…112948)

A helper (2 overloads) that consolidates corocloner creation and the
actual cloning. The helpers create a TimeTraceScope to make it easier to
see how long the cloning takes.

Extracted from llvm#109032 (commit 1)
…to (llvm#112976)

This patch is a part of step-by-step refactoring of CloneFunctionInto.
The goal is to extract reusable pieces out of it that will be later used
to optimize function cloning e.g. in coroutine processing.

Extracted from llvm#109032 (commit 2)
Passing a descriptor as a `const Descriptor &` or a `const Descriptor *`
generates a FIR signature where the box is passed by value.
This is an issue, as it requires a load of the box to be passed. But
since, ultimately, all boxes are passed by reference a temporary is
generated in LLVM and the reference to the temporary is passed.

The boxes addresses are registered with the CUDA runtime but the
temporaries are not, thus preventing the runtime to properly map a host
side address to its device side counterpart.

To address this issue, this PR changes the signatures to the transfer
functions to pass a descriptor as a `Descriptor *`, which will in turn
generate a FIR signature with that takes a box reference as an argument.
API test failed for remote platform in
[llvm#112657](llvm#112657)

Previously when putting files onto remote platform, I used `platform
file write -d <data>` which actually required a `platform file open
<path>` first in order to obtain a file descriptor.
eg. in file
[TestGDBRemotePlatformFile.py](https://github.com/llvm/llvm-project/blob/94e7d9c0bfe517507ea08b00fb00c32fb2837a82/lldb/test/API/functionalities/gdb_remote_client/TestGDBRemotePlatformFile.py#L24-L32)
To fix this, use the `platform put-file` method, which is used in the
`redirect_stdin` from this test already.
…late" (llvm#114304)

Clang importer doesn't seem to work well with this change, see
discussion in the original PR.

Reverts llvm#114258
UAVs and SRVs have already been converted to use LLVM target types and
we can disable generating of the !hlsl.uavs and !hlsl.srvs! annotations.
This will enable adding tests for structured buffers with user defined
types that this old resource annotations code does not handle (it
crashes).

Part 1 of llvm#114126
Currently we cost an interleaved memory op as if it were a load/store of
the widened vector type, but this was undercosting in all cases when
compared to the measured performance of todays hardware.

On the x280 at NF=2 and spacemit-x60 at NF=2,3 and 4, a segmented load
is carried out as a wide load and NF LMUL shuffle ops:
https://github.com/preames/bp3-microarch#vlseg_lmul_x_sew_throughput

All other NFs go through a slow path. On the spacemit-x60 this is
proportional to VLMAX * NF, and on the x280 proportional to the number
of segments.

This patch increases the cost by implementing a wide load + NF LMUL
shuffle op cost for the lowest common denominator NF=2, and then a
slower cost proportional to VL for the other NFs.

In a follow up patch we can add a tuning flag to use the faster cost
model for NF=3 and 4 on the spacemit-x60.

Note that the FIXME about illegal vectors seems to have been fixed in
llvm#100436
These pseudos used to be handled by CustomInserter to insert the
rounding
mode change for vector ceil, floor, etc. At some point they were changed
to use the InsertReadWriteCSR pass instead of the custom inserter. I
believe
that makes them redundant with the pseudos used by the RVV intrinsics
with rounding mode operand.
The previous statvfs tests had several issues, this patch updates them
to meet current standards.
`strrchr("foo", '\0')` is defined to point to the end of `foo`, rather
than returning NULL. This wasn't caught by tests, since llvm-libc's
`ASSERT_STREQ(nullptr, "");` is not an assertion error.

While I'm here, refactor the test slightly to check for NULL more
specifically. I considered adding fancier `ASSERT`s (and changing the
semantics of `ASSERT_STREQ`), but opted for a more local fix by fair
dice roll.
The change improves the code in general and, as a side effect, avoids crashing
on an impossible address space casts guarded by `__isGlobal/__isShared`, which
partially fixes llvm#112760
It's still possible to trigger the issue by using explicit AS casts w/o
AS checks, but LLVM should no longer crash on valid code.
…lvm#114192)

The `ValueDecomposer` in `DecomposeCallGraphTypes` was a workaround
around missing 1:N support in the dialect conversion. Since llvm#113032, the
dialect conversion infrastructure supports 1:N type conversions and 1:N
target materializations. The `ValueDecomposer` class is no longer
needed. (However, target materializations must still be inserted
manually, until we fully merge the 1:1 and 1:N drivers.)

Note for LLVM integration: Register 1:N target materializations on the
type converter instead of "decompose value conversions" on the
`ValueDecomposer`.
…ct.py (llvm#111776)

In the code being parsed, the comma separates following traits from the
category args. If there's no category args, it is still present.
llvm#111778)

SPIR-V grammar was updated in upstream to have an "aliases" field
instead of duplicating symbols with same values. See
KhronosGroup/SPIRV-Headers#447 for details.
kovdan01 and others added 25 commits November 1, 2024 12:21
This re-applies llvm#96164 after revert in llvm#102434.

Support the following relocations and assembly operators:

- `R_AARCH64_AUTH_ADR_GOT_PAGE` (`:got_auth:` for `adrp`)
- `R_AARCH64_AUTH_LD64_GOT_LO12_NC` (`:got_auth_lo12:` for `ldr`)
- `R_AARCH64_AUTH_GOT_ADD_LO12_NC` (`:got_auth_lo12:` for `add`)

`LOADgotAUTH` pseudo-instruction is introduced which is later expanded to
actual instruction sequence like the following.

```
adrp x16, :got_auth:sym
add x16, x16, :got_auth_lo12:sym
ldr x0, [x16]
autia x0, x16
```

If a resign is requested, like below, `LOADgotPAC` pseudo is used, and GOT
load is lowered similarly to `LOADgotAUTH`.

```
@var = global i32 0
define ptr @resign_globalvar() {
  ret ptr ptrauth (ptr @var, i32 3, i64 43)
}
```

If FPAC bit is not set and auth instruction is emitted, a check+trap sequence
similar to one used for `AUT` pseudo is emitted to ensure auth success.

Both SelectionDAG and GlobalISel are suppported.
For FastISel, we fall back to SelectionDAG.

Tests starting with 'ptrauth-' have corresponding variants w/o this prefix.

See also specification
https://github.com/ARM-software/abi-aa/blob/main/pauthabielf64/pauthabielf64.rst#appendix-signed-got
…vm#114433)

When doing a call from CMSE secure state to non-secure state for
v8-M.main, we use the VLLDM and VLSTM instructions to save, clear and
restore the FP registers around the call. These instructions both check
the CONTROL_S.SFPA bit, and if it is clear (meaning the current contents
of the FP registers are not secret) they execute as no-ops.

This causes a problem when CONTROL_S.SFPA==0 before the call, which
happens if there are no floating-point instructions executed between
entry to secure state and the call. If this is the case, then the VLSTM
instruction will do nothing, leaving the save area in the stack
uninitialised. If the called function returns a value in floating-point
registers, the call sequence includes an instruction to copy the return
value from a floating-point register to a GPR, which must be before the
VLLDM instruction. This copy sets CONTROL_S.SFPA, meaning that the VLLDM
will fully execute, and load the uninitialised stack memory into the FP
registers.

This causes two problems:
* The FP register file is clobbered, including all of the callee-saved
  registers, which might contain live values.
* The stack region might contain secret values, which will be leaked to
  non-secure state through the floating-point registers if/when we
  return to non-secure state.

The fix is to insert a `vmov s0, s0` instruction before the VLSTM
instruction, to ensure that CONTROL_S.SFPA is set for both the VLLDM and
VLSTM instruction.

CVE: https://www.cve.org/cverecord?id=CVE-2024-7883
Security bulletin:
https://developer.arm.com/Arm%20Security%20Center/Cortex-M%20Security%20Extensions%20Vulnerability
This ensures the VPIRBasicBlocks are deleted when the VPlan is
destroyed.

Fixes a buildbot failure with ASAN, including
https://lab.llvm.org/buildbot/#/builders/52/builds/3368
For GFX12 hasTFE is always true because it does not have the buffer load
to LDS instructions.
…3380)

Align the validation pass valid element datatypes check more closely to
the specification by removing i64 as a supported datatype. The spec does
not currently support it.

Signed-off-by: Luke Hutton <[email protected]>
This work is in preparation for PRs llvm#112138 and llvm#88385 where
the middle block is not guaranteed to be the immediate successor
to the region block. I've simply add new getMiddleBlock()
interfaces to VPlan that for now just return

cast<VPBasicBlock>(VectorRegion->getSingleSuccessor())

Once PR llvm#112138 lands we'll need to do more work to discover
the middle block.
…vm#114488)

This fixes the current sanitizer CI
[failures](https://lab.llvm.org/buildbot/#/builders/169/builds/4839/steps/13/logs/stdio).
I manually confirmed the fix with a MemorySanitizer build.

Signed-off-by: Sarnie, Nick <[email protected]>
…ber initializer. (llvm#114213)

This patch extends the filtering heuristic to apply for the
Lifetimebound code path.

This will suppress a common false positive:

```
namespace std {
template<typename T>
struct unique_ptr {
  T &operator*();
  T *get() const [[clang::lifetimebound]];
};
} // namespace std

struct X {
  X(std::unique_ptr<int> up) :
    pointer(up.get()), owner(std::move(up)) {}

  int *pointer;
  std::unique_ptr<int> owner;
};
```

See llvm#114201.
…ould consider the number of elements of ScalarTy. (llvm#114526)
They are called in a few different forms that we don't support.
…rious files (llvm#114524)

This pull request corrects multiple occurrences of the typo "avaliable"
to "available" across the LLVM and Clang codebase. These changes improve
the clarity and accuracy of comments and documentation. Specific
modifications are in the following files:

1. clang-tools-extra/clang-tidy/readability/FunctionCognitiveComplexityCheck.cpp:
Updated comments in readability checks for cognitive complexity.
2. llvm/include/llvm/ExecutionEngine/Orc/ExecutionUtils.h: Corrected
documentation for JITDylib responsibilities.
3. llvm/include/llvm/Target/TargetMacroFusion.td: Fixed descriptions for
FusionPredicate variables.
4. llvm/lib/CodeGen/SafeStack.cpp: Improved comments on DominatorTree
availability.
5. llvm/lib/Target/RISCV/RISCVSchedSiFive7.td: Enhanced resource usage
descriptions for vector units.
6. llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp: Updated invariant
description in shift-detect idiom logic.
7. llvm/test/MC/ARM/mve-fp-registers.s: Amended ARM MVE register
availability notes.
8. mlir/lib/Bytecode/Reader/BytecodeReader.cpp: Adjusted forward
reference descriptions for bytecode reader operations.

These changes have no impact on code functionality, focusing solely on
documentation clarity.

Co-authored-by: wangqiang <[email protected]>
Until now, these have been hardcoded as a downstream patches in lld. Add
them to the driver so that the private patches can be removed.

PS5 only. On PS4, the equivalent hardcoded configuration will remain in
the proprietary linker.

SIE tracker: TOOLCHAIN-16704
As a follow-on to 113686, this breaks the recursion between phi nodes
that have p1 = phi(x, p2) and p2 = phi(y, p1). The knownFPClass can be
calculated from the classes of p1 and p2.
This is helpful to debug CMake configuration issues such as the ones
that can happen when we build an external project (like
GoogleBenchmark).
…~X + Y) --> uadd.sat(~X, Y)` (llvm#114345)

Alive2: https://alive2.llvm.org/ce/z/mTGCo-
We cannot reuse `~X` if `m_AllOnes` matches a vector constant with some
poison elts. An alternative solution is to create a new not instead of
reusing `~X`. But it doesn't worth the effort because we need to add a
one-use check.

Fixes llvm#113869.
… fold to InstCombine (llvm#114280)

Previously we fold `div/rem X, C` into `poison` if any element of the
constant divisor `C` is zero or undef. However, it is incorrect when
threading udiv over an vector select:
https://alive2.llvm.org/ce/z/3Ninx5
```
define <2 x i32> @vec_select_udiv_poison(<2 x i1> %x) {
  %sel = select <2 x i1> %x, <2 x i32> <i32 -1, i32 -1>, <2 x i32> <i32 0, i32 1>
  %div = udiv <2 x i32> <i32 42, i32 -7>, %sel
  ret <2 x i32> %div
}
```
In this case, `threadBinOpOverSelect` folds `udiv <i32 42, i32 -7>, <i32
-1, i32 -1>` and `udiv <i32 42, i32 -7>, <i32 0, i32 1>` into
`zeroinitializer` and `poison`, respectively. One solution is to
introduce a new flag indicating that we are threading over a vector
select. But it requires to modify both `InstSimplify` and
`ConstantFold`.

However, this optimization doesn't provide benefits to real-world
programs:

https://dtcxzyw.github.io/llvm-opt-benchmark/coverage/data/zyw/opt-ci/actions-runner/_work/llvm-opt-benchmark/llvm-opt-benchmark/llvm/llvm-project/llvm/lib/IR/ConstantFold.cpp.html#L908

https://dtcxzyw.github.io/llvm-opt-benchmark/coverage/data/zyw/opt-ci/actions-runner/_work/llvm-opt-benchmark/llvm-opt-benchmark/llvm/llvm-project/llvm/lib/Analysis/InstructionSimplify.cpp.html#L1107

This patch moves the fold into InstCombine to avoid breaking numerous
existing tests.

Fixes llvm#114191 and llvm#113866 (only poison-safety issue).
With SEW=64, the vnsrl trick we primary rely on does not work.  This
is handled correctly today, but we have fairly minimal testing of the
resulting shuffles which makes it hard to demonstrate value of an
upcoming change.
)

`v2i8` is an unsupported type, so we hit the default legalization rules
which perform the bitcast in stack memory and is very inefficient on
GPU.

This adds a custom lowering where we pack `v2i8` into `i16` and from
there use another bitcast node to reach the final desired type. And also
the inverse unpacking `i16` into `v2i8`.
[AutoBump] Merge with b74e588 (Nov 01) (7)
Base automatically changed from bump_to_4ba623f2 to feature/fused-ops February 12, 2025 13:17
@jorickert jorickert merged commit 54b4bfb into feature/fused-ops Feb 13, 2025
11 checks passed
@jorickert jorickert deleted the bump_to_95c2d798 branch February 13, 2025 08:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment