forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AutoBump] Merge with fixes of 95c2d798 (Oct 30) (6) #463
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…vm#114258) This patch fixes a couple of regressions introduced in llvm#111852. Consider: ``` template<typename T> struct A { template<bool U> static constexpr bool f() requires U { return true; } }; template<> template<bool U> constexpr bool A<short>::f() requires U { return A<long>::f<U>(); } template<> template<bool U> constexpr bool A<long>::f() requires U { return true; } static_assert(A<short>::f<true>()); // crash here ``` This crashes because when collecting template arguments from the _first_ declaration of `A<long>::f<true>` for constraint checking, we don't add the template arguments from the enclosing class template specialization because there exists another redeclaration that is a member specialization. This also fixes the following example, which happens for a similar reason: ``` // input.cppm export module input; export template<int N> constexpr int f(); template<int N> struct A { template<int J> friend constexpr int f(); }; template struct A<0>; template<int N> constexpr int f() { return N; } ``` ``` // input.cpp import input; static_assert(f<1>() == 1); // error: static assertion failed ```
…FP16 to FP32) instructions (llvm#113346) The new instructions are described in https://developer.arm.com/documentation/ddi0602/2024-09/SME-Instructions
…#114274) Add Uses = [FRM] to the underlying MC instructions. Tweak a couple test cases so the MachineVerifier would have caught this.
Credits: llvm#111419 Fixes icmp-flags.mir First attempt: llvm#113090 Revert: llvm#114256
… `isInTemplateInstantiation` matchers (llvm#110666) Fix `isInstantiated` and `isInTemplateInstantiation` matchers, so they return true for instantiations of variable templates, and any declaration in statements contained in such instantiations.
This commit adds the ability to get a particular resource from an array of resources using the handle_fromBinding intrinsic. The main changes are: 1. Create an array when generating the type. 2. Add capabilities from [SPV_EXT_descriptor_indexing](https://htmlpreview.github.io/?https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/EXT/SPV_EXT_descriptor_indexing.html). We are still missing the ability to declare a runtime array. That will be done in a follow up PR.
When running Bionic's testsuite over llvm-libc, tests broke because e.g., ``` const char *str = "abc"; char buf[7]{"111111"}; strlcpy(buf, str, 7); ASSERT_EQ(buf, {'1', '1', '1', '\0', '\0', '\0', '\0'}); ``` On my machine (Debian w/ glibc and clang-16), a `printf` loop over `buf` gets unrolled into a series of const `printf` at compile-time: ``` printf("%d\n", '1'); printf("%d\n", '1'); printf("%d\n", '1'); printf("%d\n", 0); printf("%d\n", '1'); printf("%d\n", '1'); printf("%d\n", 0); ``` Seems best to match existing precedent here.
This fixes the build after the removal of the clang-format status page.
…lvm#114216) This PR implements instruction selection for G_BITCAST on an earlier stage to avoid MachineVerifier complains on subtle semantics difference between G_BITCAST and OpBitcast. We do instruction selections for OpBitcast after IR Translation instead of calling MIB.buildBitcast() generating the general op code G_BITCAST, because when MachineVerifier validates G_BITCAST we see a check of a kind: 'if Source Type is equal to Destination Type then report error "bitcast must change the type"'. This doesn't take into account the notion of a typed pointer that is important for SPIR-V where a user may and should use bitcast between pointers with different pointee types (https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpBitcast). It's important for correct lowering in SPIR-V, because interpretation of the data type is not left to instructions that utilize the pointer, but encoded by the pointer declaration, and the SPIRV target can and must handle the declaration and use of pointers that specify the type of data they point to. It's not feasible to improve validation of G_BITCAST using just information provided by low level types of source and destination. Therefore we don't produce G_BITCAST as the general op code with semantics different from OpBitcast, but rather lower to OpBitcast immediately. See discussion in llvm#110270 for even more context.
`v2*16` is a legal type in NVPTX. Thus, this is dead code.
…112948) A helper (2 overloads) that consolidates corocloner creation and the actual cloning. The helpers create a TimeTraceScope to make it easier to see how long the cloning takes. Extracted from llvm#109032 (commit 1)
…to (llvm#112976) This patch is a part of step-by-step refactoring of CloneFunctionInto. The goal is to extract reusable pieces out of it that will be later used to optimize function cloning e.g. in coroutine processing. Extracted from llvm#109032 (commit 2)
Passing a descriptor as a `const Descriptor &` or a `const Descriptor *` generates a FIR signature where the box is passed by value. This is an issue, as it requires a load of the box to be passed. But since, ultimately, all boxes are passed by reference a temporary is generated in LLVM and the reference to the temporary is passed. The boxes addresses are registered with the CUDA runtime but the temporaries are not, thus preventing the runtime to properly map a host side address to its device side counterpart. To address this issue, this PR changes the signatures to the transfer functions to pass a descriptor as a `Descriptor *`, which will in turn generate a FIR signature with that takes a box reference as an argument.
API test failed for remote platform in [llvm#112657](llvm#112657) Previously when putting files onto remote platform, I used `platform file write -d <data>` which actually required a `platform file open <path>` first in order to obtain a file descriptor. eg. in file [TestGDBRemotePlatformFile.py](https://github.com/llvm/llvm-project/blob/94e7d9c0bfe517507ea08b00fb00c32fb2837a82/lldb/test/API/functionalities/gdb_remote_client/TestGDBRemotePlatformFile.py#L24-L32) To fix this, use the `platform put-file` method, which is used in the `redirect_stdin` from this test already.
…late" (llvm#114304) Clang importer doesn't seem to work well with this change, see discussion in the original PR. Reverts llvm#114258
UAVs and SRVs have already been converted to use LLVM target types and we can disable generating of the !hlsl.uavs and !hlsl.srvs! annotations. This will enable adding tests for structured buffers with user defined types that this old resource annotations code does not handle (it crashes). Part 1 of llvm#114126
Currently we cost an interleaved memory op as if it were a load/store of the widened vector type, but this was undercosting in all cases when compared to the measured performance of todays hardware. On the x280 at NF=2 and spacemit-x60 at NF=2,3 and 4, a segmented load is carried out as a wide load and NF LMUL shuffle ops: https://github.com/preames/bp3-microarch#vlseg_lmul_x_sew_throughput All other NFs go through a slow path. On the spacemit-x60 this is proportional to VLMAX * NF, and on the x280 proportional to the number of segments. This patch increases the cost by implementing a wide load + NF LMUL shuffle op cost for the lowest common denominator NF=2, and then a slower cost proportional to VL for the other NFs. In a follow up patch we can add a tuning flag to use the faster cost model for NF=3 and 4 on the spacemit-x60. Note that the FIXME about illegal vectors seems to have been fixed in llvm#100436
These pseudos used to be handled by CustomInserter to insert the rounding mode change for vector ceil, floor, etc. At some point they were changed to use the InsertReadWriteCSR pass instead of the custom inserter. I believe that makes them redundant with the pseudos used by the RVV intrinsics with rounding mode operand.
The previous statvfs tests had several issues, this patch updates them to meet current standards.
`strrchr("foo", '\0')` is defined to point to the end of `foo`, rather than returning NULL. This wasn't caught by tests, since llvm-libc's `ASSERT_STREQ(nullptr, "");` is not an assertion error. While I'm here, refactor the test slightly to check for NULL more specifically. I considered adding fancier `ASSERT`s (and changing the semantics of `ASSERT_STREQ`), but opted for a more local fix by fair dice roll.
The change improves the code in general and, as a side effect, avoids crashing on an impossible address space casts guarded by `__isGlobal/__isShared`, which partially fixes llvm#112760 It's still possible to trigger the issue by using explicit AS casts w/o AS checks, but LLVM should no longer crash on valid code.
…lvm#114192) The `ValueDecomposer` in `DecomposeCallGraphTypes` was a workaround around missing 1:N support in the dialect conversion. Since llvm#113032, the dialect conversion infrastructure supports 1:N type conversions and 1:N target materializations. The `ValueDecomposer` class is no longer needed. (However, target materializations must still be inserted manually, until we fully merge the 1:1 and 1:N drivers.) Note for LLVM integration: Register 1:N target materializations on the type converter instead of "decompose value conversions" on the `ValueDecomposer`.
…ct.py (llvm#111776) In the code being parsed, the comma separates following traits from the category args. If there's no category args, it is still present.
llvm#111778) SPIR-V grammar was updated in upstream to have an "aliases" field instead of duplicating symbols with same values. See KhronosGroup/SPIRV-Headers#447 for details.
This re-applies llvm#96164 after revert in llvm#102434. Support the following relocations and assembly operators: - `R_AARCH64_AUTH_ADR_GOT_PAGE` (`:got_auth:` for `adrp`) - `R_AARCH64_AUTH_LD64_GOT_LO12_NC` (`:got_auth_lo12:` for `ldr`) - `R_AARCH64_AUTH_GOT_ADD_LO12_NC` (`:got_auth_lo12:` for `add`) `LOADgotAUTH` pseudo-instruction is introduced which is later expanded to actual instruction sequence like the following. ``` adrp x16, :got_auth:sym add x16, x16, :got_auth_lo12:sym ldr x0, [x16] autia x0, x16 ``` If a resign is requested, like below, `LOADgotPAC` pseudo is used, and GOT load is lowered similarly to `LOADgotAUTH`. ``` @var = global i32 0 define ptr @resign_globalvar() { ret ptr ptrauth (ptr @var, i32 3, i64 43) } ``` If FPAC bit is not set and auth instruction is emitted, a check+trap sequence similar to one used for `AUT` pseudo is emitted to ensure auth success. Both SelectionDAG and GlobalISel are suppported. For FastISel, we fall back to SelectionDAG. Tests starting with 'ptrauth-' have corresponding variants w/o this prefix. See also specification https://github.com/ARM-software/abi-aa/blob/main/pauthabielf64/pauthabielf64.rst#appendix-signed-got
…vm#114433) When doing a call from CMSE secure state to non-secure state for v8-M.main, we use the VLLDM and VLSTM instructions to save, clear and restore the FP registers around the call. These instructions both check the CONTROL_S.SFPA bit, and if it is clear (meaning the current contents of the FP registers are not secret) they execute as no-ops. This causes a problem when CONTROL_S.SFPA==0 before the call, which happens if there are no floating-point instructions executed between entry to secure state and the call. If this is the case, then the VLSTM instruction will do nothing, leaving the save area in the stack uninitialised. If the called function returns a value in floating-point registers, the call sequence includes an instruction to copy the return value from a floating-point register to a GPR, which must be before the VLLDM instruction. This copy sets CONTROL_S.SFPA, meaning that the VLLDM will fully execute, and load the uninitialised stack memory into the FP registers. This causes two problems: * The FP register file is clobbered, including all of the callee-saved registers, which might contain live values. * The stack region might contain secret values, which will be leaked to non-secure state through the floating-point registers if/when we return to non-secure state. The fix is to insert a `vmov s0, s0` instruction before the VLSTM instruction, to ensure that CONTROL_S.SFPA is set for both the VLLDM and VLSTM instruction. CVE: https://www.cve.org/cverecord?id=CVE-2024-7883 Security bulletin: https://developer.arm.com/Arm%20Security%20Center/Cortex-M%20Security%20Extensions%20Vulnerability
This ensures the VPIRBasicBlocks are deleted when the VPlan is destroyed. Fixes a buildbot failure with ASAN, including https://lab.llvm.org/buildbot/#/builders/52/builds/3368
For GFX12 hasTFE is always true because it does not have the buffer load to LDS instructions.
…3380) Align the validation pass valid element datatypes check more closely to the specification by removing i64 as a supported datatype. The spec does not currently support it. Signed-off-by: Luke Hutton <[email protected]>
This work is in preparation for PRs llvm#112138 and llvm#88385 where the middle block is not guaranteed to be the immediate successor to the region block. I've simply add new getMiddleBlock() interfaces to VPlan that for now just return cast<VPBasicBlock>(VectorRegion->getSingleSuccessor()) Once PR llvm#112138 lands we'll need to do more work to discover the middle block.
…vm#114488) This fixes the current sanitizer CI [failures](https://lab.llvm.org/buildbot/#/builders/169/builds/4839/steps/13/logs/stdio). I manually confirmed the fix with a MemorySanitizer build. Signed-off-by: Sarnie, Nick <[email protected]>
…ber initializer. (llvm#114213) This patch extends the filtering heuristic to apply for the Lifetimebound code path. This will suppress a common false positive: ``` namespace std { template<typename T> struct unique_ptr { T &operator*(); T *get() const [[clang::lifetimebound]]; }; } // namespace std struct X { X(std::unique_ptr<int> up) : pointer(up.get()), owner(std::move(up)) {} int *pointer; std::unique_ptr<int> owner; }; ``` See llvm#114201.
…ould consider the number of elements of ScalarTy. (llvm#114526)
They are called in a few different forms that we don't support.
…rious files (llvm#114524) This pull request corrects multiple occurrences of the typo "avaliable" to "available" across the LLVM and Clang codebase. These changes improve the clarity and accuracy of comments and documentation. Specific modifications are in the following files: 1. clang-tools-extra/clang-tidy/readability/FunctionCognitiveComplexityCheck.cpp: Updated comments in readability checks for cognitive complexity. 2. llvm/include/llvm/ExecutionEngine/Orc/ExecutionUtils.h: Corrected documentation for JITDylib responsibilities. 3. llvm/include/llvm/Target/TargetMacroFusion.td: Fixed descriptions for FusionPredicate variables. 4. llvm/lib/CodeGen/SafeStack.cpp: Improved comments on DominatorTree availability. 5. llvm/lib/Target/RISCV/RISCVSchedSiFive7.td: Enhanced resource usage descriptions for vector units. 6. llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp: Updated invariant description in shift-detect idiom logic. 7. llvm/test/MC/ARM/mve-fp-registers.s: Amended ARM MVE register availability notes. 8. mlir/lib/Bytecode/Reader/BytecodeReader.cpp: Adjusted forward reference descriptions for bytecode reader operations. These changes have no impact on code functionality, focusing solely on documentation clarity. Co-authored-by: wangqiang <[email protected]>
Until now, these have been hardcoded as a downstream patches in lld. Add them to the driver so that the private patches can be removed. PS5 only. On PS4, the equivalent hardcoded configuration will remain in the proprietary linker. SIE tracker: TOOLCHAIN-16704
As a follow-on to 113686, this breaks the recursion between phi nodes that have p1 = phi(x, p2) and p2 = phi(y, p1). The knownFPClass can be calculated from the classes of p1 and p2.
This is helpful to debug CMake configuration issues such as the ones that can happen when we build an external project (like GoogleBenchmark).
…~X + Y) --> uadd.sat(~X, Y)` (llvm#114345) Alive2: https://alive2.llvm.org/ce/z/mTGCo- We cannot reuse `~X` if `m_AllOnes` matches a vector constant with some poison elts. An alternative solution is to create a new not instead of reusing `~X`. But it doesn't worth the effort because we need to add a one-use check. Fixes llvm#113869.
… fold to InstCombine (llvm#114280) Previously we fold `div/rem X, C` into `poison` if any element of the constant divisor `C` is zero or undef. However, it is incorrect when threading udiv over an vector select: https://alive2.llvm.org/ce/z/3Ninx5 ``` define <2 x i32> @vec_select_udiv_poison(<2 x i1> %x) { %sel = select <2 x i1> %x, <2 x i32> <i32 -1, i32 -1>, <2 x i32> <i32 0, i32 1> %div = udiv <2 x i32> <i32 42, i32 -7>, %sel ret <2 x i32> %div } ``` In this case, `threadBinOpOverSelect` folds `udiv <i32 42, i32 -7>, <i32 -1, i32 -1>` and `udiv <i32 42, i32 -7>, <i32 0, i32 1>` into `zeroinitializer` and `poison`, respectively. One solution is to introduce a new flag indicating that we are threading over a vector select. But it requires to modify both `InstSimplify` and `ConstantFold`. However, this optimization doesn't provide benefits to real-world programs: https://dtcxzyw.github.io/llvm-opt-benchmark/coverage/data/zyw/opt-ci/actions-runner/_work/llvm-opt-benchmark/llvm-opt-benchmark/llvm/llvm-project/llvm/lib/IR/ConstantFold.cpp.html#L908 https://dtcxzyw.github.io/llvm-opt-benchmark/coverage/data/zyw/opt-ci/actions-runner/_work/llvm-opt-benchmark/llvm-opt-benchmark/llvm/llvm-project/llvm/lib/Analysis/InstructionSimplify.cpp.html#L1107 This patch moves the fold into InstCombine to avoid breaking numerous existing tests. Fixes llvm#114191 and llvm#113866 (only poison-safety issue).
With SEW=64, the vnsrl trick we primary rely on does not work. This is handled correctly today, but we have fairly minimal testing of the resulting shuffles which makes it hard to demonstrate value of an upcoming change.
) `v2i8` is an unsupported type, so we hit the default legalization rules which perform the bitcast in stack memory and is very inefficient on GPU. This adds a custom lowering where we pack `v2i8` into `i16` and from there use another bitcast node to reach the final desired type. And also the inverse unpacking `i16` into `v2i8`.
mgehre-amd
approved these changes
Feb 10, 2025
[AutoBump] Merge with b74e588 (Nov 01) (7)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.