[AutoBump] Merge with fixes of 95c2d798 (Oct 30) (6) #463

jorickert · 2025-02-03T08:14:55Z

No description provided.

…vm#114258) This patch fixes a couple of regressions introduced in llvm#111852. Consider: ``` template<typename T> struct A { template<bool U> static constexpr bool f() requires U { return true; } }; template<> template<bool U> constexpr bool A<short>::f() requires U { return A<long>::f<U>(); } template<> template<bool U> constexpr bool A<long>::f() requires U { return true; } static_assert(A<short>::f<true>()); // crash here ``` This crashes because when collecting template arguments from the _first_ declaration of `A<long>::f<true>` for constraint checking, we don't add the template arguments from the enclosing class template specialization because there exists another redeclaration that is a member specialization. This also fixes the following example, which happens for a similar reason: ``` // input.cppm export module input; export template<int N> constexpr int f(); template<int N> struct A { template<int J> friend constexpr int f(); }; template struct A<0>; template<int N> constexpr int f() { return N; } ``` ``` // input.cpp import input; static_assert(f<1>() == 1); // error: static assertion failed ```

…FP16 to FP32) instructions (llvm#113346) The new instructions are described in https://developer.arm.com/documentation/ddi0602/2024-09/SME-Instructions

…#114274) Add Uses = [FRM] to the underlying MC instructions. Tweak a couple test cases so the MachineVerifier would have caught this.

Credits: llvm#111419 Fixes icmp-flags.mir First attempt: llvm#113090 Revert: llvm#114256

Link: llvm#93709

… `isInTemplateInstantiation` matchers (llvm#110666) Fix `isInstantiated` and `isInTemplateInstantiation` matchers, so they return true for instantiations of variable templates, and any declaration in statements contained in such instantiations.

This commit adds the ability to get a particular resource from an array of resources using the handle_fromBinding intrinsic. The main changes are: 1. Create an array when generating the type. 2. Add capabilities from [SPV_EXT_descriptor_indexing](https://htmlpreview.github.io/?https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/EXT/SPV_EXT_descriptor_indexing.html). We are still missing the ability to declare a runtime array. That will be done in a follow up PR.

When running Bionic's testsuite over llvm-libc, tests broke because e.g., ``` const char *str = "abc"; char buf[7]{"111111"}; strlcpy(buf, str, 7); ASSERT_EQ(buf, {'1', '1', '1', '\0', '\0', '\0', '\0'}); ``` On my machine (Debian w/ glibc and clang-16), a `printf` loop over `buf` gets unrolled into a series of const `printf` at compile-time: ``` printf("%d\n", '1'); printf("%d\n", '1'); printf("%d\n", '1'); printf("%d\n", 0); printf("%d\n", '1'); printf("%d\n", '1'); printf("%d\n", 0); ``` Seems best to match existing precedent here.

This fixes the build after the removal of the clang-format status page.

…es. (llvm#113476)

…lvm#114216) This PR implements instruction selection for G_BITCAST on an earlier stage to avoid MachineVerifier complains on subtle semantics difference between G_BITCAST and OpBitcast. We do instruction selections for OpBitcast after IR Translation instead of calling MIB.buildBitcast() generating the general op code G_BITCAST, because when MachineVerifier validates G_BITCAST we see a check of a kind: 'if Source Type is equal to Destination Type then report error "bitcast must change the type"'. This doesn't take into account the notion of a typed pointer that is important for SPIR-V where a user may and should use bitcast between pointers with different pointee types (https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpBitcast). It's important for correct lowering in SPIR-V, because interpretation of the data type is not left to instructions that utilize the pointer, but encoded by the pointer declaration, and the SPIRV target can and must handle the declaration and use of pointers that specify the type of data they point to. It's not feasible to improve validation of G_BITCAST using just information provided by low level types of source and destination. Therefore we don't produce G_BITCAST as the general op code with semantics different from OpBitcast, but rather lower to OpBitcast immediately. See discussion in llvm#110270 for even more context.

Link: llvm#93709

`v2*16` is a legal type in NVPTX. Thus, this is dead code.

…amounts. NFC

…112948) A helper (2 overloads) that consolidates corocloner creation and the actual cloning. The helpers create a TimeTraceScope to make it easier to see how long the cloning takes. Extracted from llvm#109032 (commit 1)

…to (llvm#112976) This patch is a part of step-by-step refactoring of CloneFunctionInto. The goal is to extract reusable pieces out of it that will be later used to optimize function cloning e.g. in coroutine processing. Extracted from llvm#109032 (commit 2)

Passing a descriptor as a `const Descriptor &` or a `const Descriptor *` generates a FIR signature where the box is passed by value. This is an issue, as it requires a load of the box to be passed. But since, ultimately, all boxes are passed by reference a temporary is generated in LLVM and the reference to the temporary is passed. The boxes addresses are registered with the CUDA runtime but the temporaries are not, thus preventing the runtime to properly map a host side address to its device side counterpart. To address this issue, this PR changes the signatures to the transfer functions to pass a descriptor as a `Descriptor *`, which will in turn generate a FIR signature with that takes a box reference as an argument.

API test failed for remote platform in [llvm#112657](llvm#112657) Previously when putting files onto remote platform, I used `platform file write -d <data>` which actually required a `platform file open <path>` first in order to obtain a file descriptor. eg. in file [TestGDBRemotePlatformFile.py](https://github.com/llvm/llvm-project/blob/94e7d9c0bfe517507ea08b00fb00c32fb2837a82/lldb/test/API/functionalities/gdb_remote_client/TestGDBRemotePlatformFile.py#L24-L32) To fix this, use the `platform put-file` method, which is used in the `redirect_stdin` from this test already.

…late" (llvm#114304) Clang importer doesn't seem to work well with this change, see discussion in the original PR. Reverts llvm#114258

UAVs and SRVs have already been converted to use LLVM target types and we can disable generating of the !hlsl.uavs and !hlsl.srvs! annotations. This will enable adding tests for structured buffers with user defined types that this old resource annotations code does not handle (it crashes). Part 1 of llvm#114126

Currently we cost an interleaved memory op as if it were a load/store of the widened vector type, but this was undercosting in all cases when compared to the measured performance of todays hardware. On the x280 at NF=2 and spacemit-x60 at NF=2,3 and 4, a segmented load is carried out as a wide load and NF LMUL shuffle ops: https://github.com/preames/bp3-microarch#vlseg_lmul_x_sew_throughput All other NFs go through a slow path. On the spacemit-x60 this is proportional to VLMAX * NF, and on the x280 proportional to the number of segments. This patch increases the cost by implementing a wide load + NF LMUL shuffle op cost for the lowest common denominator NF=2, and then a slower cost proportional to VL for the other NFs. In a follow up patch we can add a tuning flag to use the faster cost model for NF=3 and 4 on the spacemit-x60. Note that the FIXME about illegal vectors seems to have been fixed in llvm#100436

These pseudos used to be handled by CustomInserter to insert the rounding mode change for vector ceil, floor, etc. At some point they were changed to use the InsertReadWriteCSR pass instead of the custom inserter. I believe that makes them redundant with the pseudos used by the RVV intrinsics with rounding mode operand.

The previous statvfs tests had several issues, this patch updates them to meet current standards.

`strrchr("foo", '\0')` is defined to point to the end of `foo`, rather than returning NULL. This wasn't caught by tests, since llvm-libc's `ASSERT_STREQ(nullptr, "");` is not an assertion error. While I'm here, refactor the test slightly to check for NULL more specifically. I considered adding fancier `ASSERT`s (and changing the semantics of `ASSERT_STREQ`), but opted for a more local fix by fair dice roll.

Reverts llvm#113724

The change improves the code in general and, as a side effect, avoids crashing on an impossible address space casts guarded by `__isGlobal/__isShared`, which partially fixes llvm#112760 It's still possible to trigger the issue by using explicit AS casts w/o AS checks, but LLVM should no longer crash on valid code.

…lvm#114192) The `ValueDecomposer` in `DecomposeCallGraphTypes` was a workaround around missing 1:N support in the dialect conversion. Since llvm#113032, the dialect conversion infrastructure supports 1:N type conversions and 1:N target materializations. The `ValueDecomposer` class is no longer needed. (However, target materializations must still be inserted manually, until we fully merge the 1:1 and 1:N drivers.) Note for LLVM integration: Register 1:N target materializations on the type converter instead of "decompose value conversions" on the `ValueDecomposer`.

Reverts llvm#112964 Crashes MLIR: https://lab.llvm.org/buildbot/#/builders/138/builds/5665

…ct.py (llvm#111776) In the code being parsed, the comma separates following traits from the category args. If there's no category args, it is still present.

llvm#111778) SPIR-V grammar was updated in upstream to have an "aliases" field instead of duplicating symbols with same values. See KhronosGroup/SPIRV-Headers#447 for details.

@var

This re-applies llvm#96164 after revert in llvm#102434. Support the following relocations and assembly operators: - `R_AARCH64_AUTH_ADR_GOT_PAGE` (`:got_auth:` for `adrp`) - `R_AARCH64_AUTH_LD64_GOT_LO12_NC` (`:got_auth_lo12:` for `ldr`) - `R_AARCH64_AUTH_GOT_ADD_LO12_NC` (`:got_auth_lo12:` for `add`) `LOADgotAUTH` pseudo-instruction is introduced which is later expanded to actual instruction sequence like the following. ``` adrp x16, :got_auth:sym add x16, x16, :got_auth_lo12:sym ldr x0, [x16] autia x0, x16 ``` If a resign is requested, like below, `LOADgotPAC` pseudo is used, and GOT load is lowered similarly to `LOADgotAUTH`. ``` @var = global i32 0 define ptr @resign_globalvar() { ret ptr ptrauth (ptr @var, i32 3, i64 43) } ``` If FPAC bit is not set and auth instruction is emitted, a check+trap sequence similar to one used for `AUT` pseudo is emitted to ensure auth success. Both SelectionDAG and GlobalISel are suppported. For FastISel, we fall back to SelectionDAG. Tests starting with 'ptrauth-' have corresponding variants w/o this prefix. See also specification https://github.com/ARM-software/abi-aa/blob/main/pauthabielf64/pauthabielf64.rst#appendix-signed-got

…vm#114433) When doing a call from CMSE secure state to non-secure state for v8-M.main, we use the VLLDM and VLSTM instructions to save, clear and restore the FP registers around the call. These instructions both check the CONTROL_S.SFPA bit, and if it is clear (meaning the current contents of the FP registers are not secret) they execute as no-ops. This causes a problem when CONTROL_S.SFPA==0 before the call, which happens if there are no floating-point instructions executed between entry to secure state and the call. If this is the case, then the VLSTM instruction will do nothing, leaving the save area in the stack uninitialised. If the called function returns a value in floating-point registers, the call sequence includes an instruction to copy the return value from a floating-point register to a GPR, which must be before the VLLDM instruction. This copy sets CONTROL_S.SFPA, meaning that the VLLDM will fully execute, and load the uninitialised stack memory into the FP registers. This causes two problems: * The FP register file is clobbered, including all of the callee-saved registers, which might contain live values. * The stack region might contain secret values, which will be leaked to non-secure state through the floating-point registers if/when we return to non-secure state. The fix is to insert a `vmov s0, s0` instruction before the VLSTM instruction, to ensure that CONTROL_S.SFPA is set for both the VLLDM and VLSTM instruction. CVE: https://www.cve.org/cverecord?id=CVE-2024-7883 Security bulletin: https://developer.arm.com/Arm%20Security%20Center/Cortex-M%20Security%20Extensions%20Vulnerability

This ensures the VPIRBasicBlocks are deleted when the VPlan is destroyed. Fixes a buildbot failure with ASAN, including https://lab.llvm.org/buildbot/#/builders/52/builds/3368

For GFX12 hasTFE is always true because it does not have the buffer load to LDS instructions.

…3380) Align the validation pass valid element datatypes check more closely to the specification by removing i64 as a supported datatype. The spec does not currently support it. Signed-off-by: Luke Hutton <[email protected]>

This work is in preparation for PRs llvm#112138 and llvm#88385 where the middle block is not guaranteed to be the immediate successor to the region block. I've simply add new getMiddleBlock() interfaces to VPlan that for now just return cast<VPBasicBlock>(VectorRegion->getSingleSuccessor()) Once PR llvm#112138 lands we'll need to do more work to discover the middle block.

…vm#114488) This fixes the current sanitizer CI [failures](https://lab.llvm.org/buildbot/#/builders/169/builds/4839/steps/13/logs/stdio). I manually confirmed the fix with a MemorySanitizer build. Signed-off-by: Sarnie, Nick <[email protected]>

…ber initializer. (llvm#114213) This patch extends the filtering heuristic to apply for the Lifetimebound code path. This will suppress a common false positive: ``` namespace std { template<typename T> struct unique_ptr { T &operator*(); T *get() const [[clang::lifetimebound]]; }; } // namespace std struct X { X(std::unique_ptr<int> up) : pointer(up.get()), owner(std::move(up)) {} int *pointer; std::unique_ptr<int> owner; }; ``` See llvm#114201.

…ould consider the number of elements of ScalarTy. (llvm#114526)

They are called in a few different forms that we don't support.

…rious files (llvm#114524) This pull request corrects multiple occurrences of the typo "avaliable" to "available" across the LLVM and Clang codebase. These changes improve the clarity and accuracy of comments and documentation. Specific modifications are in the following files: 1. clang-tools-extra/clang-tidy/readability/FunctionCognitiveComplexityCheck.cpp: Updated comments in readability checks for cognitive complexity. 2. llvm/include/llvm/ExecutionEngine/Orc/ExecutionUtils.h: Corrected documentation for JITDylib responsibilities. 3. llvm/include/llvm/Target/TargetMacroFusion.td: Fixed descriptions for FusionPredicate variables. 4. llvm/lib/CodeGen/SafeStack.cpp: Improved comments on DominatorTree availability. 5. llvm/lib/Target/RISCV/RISCVSchedSiFive7.td: Enhanced resource usage descriptions for vector units. 6. llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp: Updated invariant description in shift-detect idiom logic. 7. llvm/test/MC/ARM/mve-fp-registers.s: Amended ARM MVE register availability notes. 8. mlir/lib/Bytecode/Reader/BytecodeReader.cpp: Adjusted forward reference descriptions for bytecode reader operations. These changes have no impact on code functionality, focusing solely on documentation clarity. Co-authored-by: wangqiang <[email protected]>

Until now, these have been hardcoded as a downstream patches in lld. Add them to the driver so that the private patches can be removed. PS5 only. On PS4, the equivalent hardcoded configuration will remain in the proprietary linker. SIE tracker: TOOLCHAIN-16704

As a follow-on to 113686, this breaks the recursion between phi nodes that have p1 = phi(x, p2) and p2 = phi(y, p1). The knownFPClass can be calculated from the classes of p1 and p2.

This is helpful to debug CMake configuration issues such as the ones that can happen when we build an external project (like GoogleBenchmark).

…14485)

…~X + Y) --> uadd.sat(~X, Y)` (llvm#114345) Alive2: https://alive2.llvm.org/ce/z/mTGCo- We cannot reuse `~X` if `m_AllOnes` matches a vector constant with some poison elts. An alternative solution is to create a new not instead of reusing `~X`. But it doesn't worth the effort because we need to add a one-use check. Fixes llvm#113869.

… fold to InstCombine (llvm#114280) Previously we fold `div/rem X, C` into `poison` if any element of the constant divisor `C` is zero or undef. However, it is incorrect when threading udiv over an vector select: https://alive2.llvm.org/ce/z/3Ninx5 ``` define <2 x i32> @vec_select_udiv_poison(<2 x i1> %x) { %sel = select <2 x i1> %x, <2 x i32> <i32 -1, i32 -1>, <2 x i32> <i32 0, i32 1> %div = udiv <2 x i32> <i32 42, i32 -7>, %sel ret <2 x i32> %div } ``` In this case, `threadBinOpOverSelect` folds `udiv <i32 42, i32 -7>, <i32 -1, i32 -1>` and `udiv <i32 42, i32 -7>, <i32 0, i32 1>` into `zeroinitializer` and `poison`, respectively. One solution is to introduce a new flag indicating that we are threading over a vector select. But it requires to modify both `InstSimplify` and `ConstantFold`. However, this optimization doesn't provide benefits to real-world programs: https://dtcxzyw.github.io/llvm-opt-benchmark/coverage/data/zyw/opt-ci/actions-runner/_work/llvm-opt-benchmark/llvm-opt-benchmark/llvm/llvm-project/llvm/lib/IR/ConstantFold.cpp.html#L908 https://dtcxzyw.github.io/llvm-opt-benchmark/coverage/data/zyw/opt-ci/actions-runner/_work/llvm-opt-benchmark/llvm-opt-benchmark/llvm/llvm-project/llvm/lib/Analysis/InstructionSimplify.cpp.html#L1107 This patch moves the fold into InstCombine to avoid breaking numerous existing tests. Fixes llvm#114191 and llvm#113866 (only poison-safety issue).

With SEW=64, the vnsrl trick we primary rely on does not work. This is handled correctly today, but we have fairly minimal testing of the resulting shuffles which makes it hard to demonstrate value of an upcoming change.

) `v2i8` is an unsupported type, so we hit the default legalization rules which perform the bitcast in stack memory and is very inefficient on GPU. This adds a custom lowering where we pack `v2i8` into `i16` and from there use another bitcast node to reach the final desired type. And also the inverse unpacking `i16` into `v2i8`.

[AutoBump] Merge with b74e588 (Nov 01) (7)

sdkrystian and others added 30 commits October 30, 2024 14:50

[AArch64] Add asssembly/disassembly for FMOP4{A,S} (widening, 2-way, …

47d9db7

…FP16 to FP32) instructions (llvm#113346) The new instructions are described in https://developer.arm.com/documentation/ddi0602/2024-09/SME-Instructions

[RISCV] Add hasPostISelHook to sf.vfnrclip pseudo instructions. (llvm…

408c84f

…#114274) Add Uses = [FRM] to the underlying MC instructions. Tweak a couple test cases so the MachineVerifier would have caught this.

[GlobalISel] Import samesign flag (llvm#114267)

b3bb6f1

Credits: llvm#111419 Fixes icmp-flags.mir First attempt: llvm#113090 Revert: llvm#114256

[libc][i386] define MINSIGSTKSZ & SIGSTKSZ (llvm#114249)

dc1ff88

Link: llvm#93709

Fix documentation build

e4dfb51

This fixes the build after the removal of the clang-format status page.

[MLIR] [AMX] Fix strides used by AMX lowering for tile loads and stor…

d210964

…es. (llvm#113476)

[libc][i386] setjmp/longjmp (llvm#112437)

b1320d3

Link: llvm#93709

[NFC][NVPTX] Cleanup getPreferredVectorAction() (llvm#114115)

e89f821

`v2*16` is a legal type in NVPTX. Thus, this is dead code.

[RISCV] Use unsigned instead of int64_t for two small positive shift …

0167a92

…amounts. NFC

Revert "[Clang][Sema] Always use latest redeclaration of primary temp…

4afa978

…late" (llvm#114304) Clang importer doesn't seem to work well with this change, see discussion in the original PR. Reverts llvm#114258

[libc] Refactor statvfs tests (llvm#114147)

5d35747

The previous statvfs tests had several issues, this patch updates them to meet current standards.

Revert "[TLI] Add support for hypot libcall." (llvm#114312)

36d5692

Reverts llvm#113724

Revert "[NVPTX] instcombine known pointer AS checks." (llvm#114319)

04e876e

Reverts llvm#112964 Crashes MLIR: https://lab.llvm.org/buildbot/#/builders/138/builds/5665

[mlir][spirv] Ignore extra comma for category_args in gen_spirv_diale…

67c4857

…ct.py (llvm#111776) In the code being parsed, the comma separates following traits from the category args. If there's no category args, it is still present.

[mlir][spirv] Remove code for de-duplicating symbols in SPIR-V grammar (

6e75eec

llvm#111778) SPIR-V grammar was updated in upstream to have an "aliases" field instead of duplicating symbols with same values. See KhronosGroup/SPIRV-Headers#447 for details.

kovdan01 and others added 25 commits November 1, 2024 12:21

[VPlan] Connect scalar header to VPlan CFG in unit tests.

659c369

This ensures the VPIRBasicBlocks are deleted when the VPlan is destroyed. Fixes a buildbot failure with ASAN, including https://lab.llvm.org/buildbot/#/builders/52/builds/3368

[AMDGPU] Simplify GFX12 VBUFFER definitions. NFC. (llvm#114403)

550501f

For GFX12 hasTFE is always true because it does not have the buffer load to LDS instructions.

AssumeBundleBuilder: switch placeholder from undef to poison [NFC]

344d972

[SLP][REVEC] When ScalarTy is FixedVectorType, the insertion index sh…

e4aeeba

…ould consider the number of elements of ScalarTy. (llvm#114526)

[clang][bytecode] Add more checks to _ai32_* builtins (llvm#114412)

8951b51

They are called in a few different forms that we don't support.

[ValueTracking] Handle recursive phis in knownFPClass (llvm#114008)

0f91944

As a follow-on to 113686, this breaks the recursion between phi nodes that have p1 = phi(x, p2) and p2 = phi(y, p1). The knownFPClass can be calculated from the classes of p1 and p2.

[libc++] Add a few missing includes

88f8993

[libc++] Upload CMakeConfigureLog artifacts (llvm#114445)

f2019fc

This is helpful to debug CMake configuration issues such as the ones that can happen when we build an external project (like GoogleBenchmark).

[libc++] Fix dumb typo

23e2a04

[VPlan] Don't leak ScalarHeader BasicBlock in unit tests.

edd6b1f

[clang][bytecode] Implement bitcasts to floating-point values (llvm#1…

c752efb

…14485)

[RISCV] Add tests for deinterleave shuffles w/o vnsrl.vv

58f525a

With SEW=64, the vnsrl trick we primary rely on does not work. This is handled correctly today, but we have fairly minimal testing of the resulting shuffles which makes it hard to demonstrate value of an upcoming change.

[AutoBump] Merge with fixes of 95c2d79 (Oct 30)

295c8a6

[AutoBump] Merge with b74e588 (Nov 01)

250ecd5

mgehre-amd approved these changes Feb 10, 2025

View reviewed changes

Merge pull request #464 from Xilinx/bump_to_b74e588e

03336be

[AutoBump] Merge with b74e588 (Nov 01) (7)

Base automatically changed from bump_to_4ba623f2 to feature/fused-ops February 12, 2025 13:17

jorickert merged commit 54b4bfb into feature/fused-ops Feb 13, 2025
11 checks passed

jorickert deleted the bump_to_95c2d798 branch February 13, 2025 08:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoBump] Merge with fixes of 95c2d798 (Oct 30) (6) #463

[AutoBump] Merge with fixes of 95c2d798 (Oct 30) (6) #463

jorickert commented Feb 3, 2025

[AutoBump] Merge with fixes of 95c2d798 (Oct 30) (6) #463

[AutoBump] Merge with fixes of 95c2d798 (Oct 30) (6) #463

Conversation

jorickert commented Feb 3, 2025