[pull] main from llvm:main #5546

pull · 2025-01-16T01:14:23Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.1)

Can you help keep this open source service alive? 💖 Please sponsor : )

#109430) After 1595988 diag::warn_undefined_reinterpret_cast started raising on non-instantiated template functions without sufficient knowledge whether the reinterpret_cast is indeed UB.

This prevents creating range class instances from temporaries.

This is the next step to move the CMake cache file builder closer to the build configuration we care about downstream.

…123571) Don't report dead pointers if we've checking for a potential constant expression.

Use the methods accepting LLVMContext instead.

…D_TYPE

… ARM64X (#123346) Includes handling for ARM64X relocations relative to a symbol.

Adding SPIRV to LLVM_ALL_TARGETS (#119653) revealed a series of minor compilation problems and sanitizer complaints. This PR is to address the problem.

This registers the pass with PassRegistry so we can use -start-before and other options for machine-function-splitter.

This is to avoid race conditions with other tests.

If Polly is built with LLVM_POLLY_LINK_INTO_TOOLS=ON (the default for monorepo builds), then Polly will become a dependency of the LLVMExtensions component, which is part of LLVMExports. As such, all the Polly libraries also have to be part of LLVMExports. However, if Polly is built with LLVM_POLLY_LINK_INTO_TOOLS=OFF, we also end up adding Polly libraries to LLVMExports. This is undesirable, as it adds a hard dependency from llvm on polly. Fix this by only exporting polly libraries from LLVMExports if LLVM_POLLY_LINK_INTO_TOOLS is enabled.

…tests (NFC) Allow arbitrary attributes, including those with arguments.

The only thing these tests care about from an ABI perspective is sret, don't also test all the optimization attributes.

…ectorInstrCost. NFCI This should not effect the result, unless the getArithmeticInstrCost and getVectorInstrCost routines learn to produce different costs (with CostKind = CodeSize for example). The -1 lanes prevent 0 lanes from (incorrectly) being marked as free.

To reduce diffs in an upcoming change.

When `RecordType` is converted to corresponding `DIType`, we cache the information to avoid doing the conversion again. Our conversion of `RecordType` looks like this: `ConvertRecordType(RecordType Ty)` 1. If type `Ty` is already in the cache, then return the corresponding item. 2. Create a place holder `DICompositeTypeAttr` (called `ty_self` below) for `Ty` 3. Put `Ty->ty_self` in the cache 4. Convert members of `Ty`. This may cause `ConvertRecordType` to be called again with other types. 5. Create final `DICompositeTypeAttr` 6. Replace the `ty_self` in the cache with one created in step 5 end The purpose of creating `ty_self` is to handle cases where a member may have reference to parent type. Now consider the code below: ``` type t1 type(t2), pointer :: p1 end type type t2 type(t1), pointer :: p2 end type ``` While processing t1, we could have a structure like below. `t1 -> t2 -> t1_self` The `t2` created during handling of `t1` cant be cached on its own as it contains a place holder reference. It will fail an assert in MLIR if it is processed standalone. To avoid this problem, we have a check in the step 6 above to not cache such types. But this check was not tight enough. It just checked if a type should not have a place holder reference to another type. It missed the following case where the place holder reference can be in a type further down the line. ``` type t1 type(t2), pointer :: p1 end type type t2 type(t3), pointer :: p2 end type type t3 type(t1), pointer :: p3 end type ``` So while processing `t1`, we have to stop caching of not only `t3` but also of `t2`. This PR improves the check and moves the logic inside `convertRecordType`. Please note that this limitation of why a type cant have a placeholder reference is because of how such references are resolved in the mlir. Please see the discussion at the end of this [PR](#106571). I have to change `getDerivedType` so that it will also get the derived type for things like `type(t2), pointer :: p1` which are wrapped in `BoxType`. Happy to move it to a new function or a local helper in case this change is problematic. Fixes #122024.

…123588) We need to emit the 'initializer of X is not a constant expression' note for local constexpr variables as well.

This PR allows to lower **unsigned** `tosa.max_pool2d` to linalg. ``` // CHECK-LABEL: @max_pool_ui8 func.func @max_pool_ui8(%arg0: tensor<1x6x34x62xui8>) -> tensor<1x4x32x62xui8> { // CHECK: builtin.unrealized_conversion_cast {{.*}} : tensor<1x6x34x62xui8> to tensor<1x6x34x62xi8> // CHECK: arith.constant 0 // CHECK: linalg.pooling_nhwc_max_unsigned {{.*}} : (tensor<1x4x32x62xi8>) -> tensor<1x4x32x62xi8> // CHECK: builtin.unrealized_conversion_cast {{.*}} : tensor<1x4x32x62xi8> to tensor<1x4x32x62xui8> %0 = tosa.max_pool2d %arg0 {pad = array<i64: 0, 0, 0, 0>, kernel = array<i64: 3, 3>, stride = array<i64: 1, 1>} : (tensor<1x6x34x62xui8>) -> tensor<1x4x32x62xui8> return %0 : tensor<1x4x32x62xui8> } ``` It does this by - converting the MaxPool2dConverter from OpRewriterPattern to OpConversion Pattern - adjusting the padding value to the the minimum unsigned value when the max_pool is unsigned - lowering to `linalg.pooling_nhwc_max_unsigned` (which uses `arith.maxui`) when the max_pool is unsigned

) Based on feedback from the clastb codegen PR, I'm refactoring basic codegen for the vector.extract.last.active intrinsic to lower to an ISD node in SelectionDAGBuilder then expand in LegalizeVectorOps, instead of doing everything in the builder. The new ISD node (vector_find_last_active) only covers finding the index of the last active element of the mask, and extracting the element + handling passthru is left to existing ISD nodes.

We are not handling 'S' scalar dependencies correctly and have at least the following miscompiles related to that: [LoopInterchange] incorrect handling of scalar dependencies and dependence vectors starting with ">" #54176 [LoopInterchange] Interchange breaks program correctness #46867 [LoopInterchange] Loops should not interchanged due to dependencies #47259 [LoopInterchange] Loops should not interchanged due to control flow #47401 This patch does no longer insert the "S" dependency/direction into the dependency matrix, so a dependency is never "S". We seem to have forgotten what the exact meaning is of this dependency type, and don't see why it should be treated differently. We prefer correctness over incorrect and more aggressive results. I.e., this prevents the miscompiles at the expense of handling less cases, i.e. making interchange more pessimistic. However, some of the cases that are now rejected for dependence analysis reasons, were rejected before too but for other reasons (e.g. profitability). So at least for the llvm regression tests, the number of regression are very reasonable. This should be a stopgap. We would like to get interchange enabled by default and thus prefer correctness over unsafe transforms, and later see if we can get solve the regressions.

…#120474) This patch adds several instructions seen when trying to run a executable built with ASan with llvm-mingw. (x86 and x86_64, using the git tip in llvm-project). Also includes instructions collected by Roman Pišl and Eric Pouech in the Wine bug reports below. ``` Related: #96270 Co-authored-by: Roman Pišl <[email protected]> https://bugs.winehq.org/show_bug.cgi?id=50993 https://bugs.winehq.org/attachment.cgi?id=70233 Co-authored-by: Eric Pouech <[email protected]> https://bugs.winehq.org/show_bug.cgi?id=52386 https://bugs.winehq.org/attachment.cgi?id=71626 ```

This is a follow up to 68a3908 (func: Set default dialect to 'emitc'), but for other instructions with blocks to make it look consistent.

In https://reviews.llvm.org/D136765 / https://reviews.llvm.org/D144155, the asan annotations for `std::vector` were modified to unpoison freed backing memory on destruction, instead of leaving it poisoned. However, calling `__clear()` instead of `clear()` skips informing the asan runtime of this decrease in the accessible container size, which breaks the invariant that the value of `old_mid` should match the value of `new_mid` from the previous call to `__sanitizer_annotate_contiguous_container`, which can trip the sanity checks for the partial poison between [d1, d2) and the container redzone between [d2, c), if enabled. To fix this, ensure that `clear()` is called instead, as is already done by `__vdeallocate()`. Also remove `__clear()`, since it is no longer called.

The function getPartialReductionCost is already quite large and is likely to grow in size as we add support for more cases in future. Therefore, I think it's best to move this into the cpp file.

Reverts #122992 Due to an included failing test-case the commit causes build failures.

SBSaveCoreOptions has been around for awhile now, so I decided to draft up some Docstrings describing the functionality better. Some of my wording sounded a bit clunky due the optionality of each method call so I would greatly appreciate feedback. Includes the new method in #122541 so I will merge this as a follow up.

#123679) `Generalization.cpp:53` ```cpp FailureOr<GenericOp> mlir::linalg::generalizeNamedOp(RewriterBase &rewriter, LinalgOp linalgOp) { if (failed(generalizeNamedOpPrecondition(linalgOp))) return rewriter.notifyMatchFailure(linalgOp, "preconditions not met"); SmallVector<Value> inputs = linalgOp.getDpsInputs(); ValueRange outputs = linalgOp.getDpsInits(); SmallVector<AffineMap> indexingMaps = linalgOp.getIndexingMapsArray(); SmallVector<utils::IteratorType> iterators = linalgOp.getIteratorTypesArray(); SmallVector<Type> resultTypes = linalgOp.hasPureTensorSemantics() ? TypeRange(ValueRange(outputs)) : TypeRange{}; ... ``` `generalizeNamedOp` in `Generalization.cpp` has a different arg name than `generalizeNamedOp` in `Transforms.h` Sync to use `linalgOp`

This patch implements the directive #pragma clang section on COFF targets with the exact same features available on ELF and Mach-O.

Today, emitLinkerDirectives is private to TLOFCOFF-- it isolates parsing and processing of the linker options. Similar processing is also done by other TLOFs inline within emitModuleMetadata. This patch promotes emitLinkerDirectives to a virtual (public) method so that this handling is similarly isolated in the other TLOFs. This also enables downstream targets to override just this handling instead of the whole of emitModuleMetadata.

#123235) .gfids$y contains a list of indirect calls for Control Flow Guard. This wasn't working properly for ARM64EC: direct calls were being treated as indirect calls. Make sure we correctly filter out direct calls. This improves the protection from Control Flow Guard, and also fixes a link error when using certain functions from oldnames.lib.

…torToLLVMPass (#123491) This flag enables the configuration of some transformation such as the lowering of contractions and transposes. The default configuration preserves the existing behavior.

@JDevlieghere

…123826) Recently I added SBProgress (#119052), and during that original commit I tested if the progress event was sent over LLDB-DAP, and it was. However upon the suggestion of @JDevlieghere and @labath we added an external category (#120171), which I did not test. This small patch wires up DAP to listen for external events by default, and adds the external category to the SBDebugger enumeration.

…123008) This cleanup action only needs to be performed once when the entire optimization is converged. Doing it in every iteration has a very high time-complexity, as it queries every dbg value in a dense map Compare before and after for one internal source file with many basic blocks ![image](https://github.com/user-attachments/assets/1dac76a9-a974-4068-9aa1-4041f963fa8e) ![image](https://github.com/user-attachments/assets/73ea2ef1-d1f4-4064-8826-8c13fb539b8d) >90% reduction in this extreme case.

…block argument (#123829)

This patch adds support for replacing calls to `__hipstdpar_hidden_malloc` with calls to `__libc_malloc`, similarly to how we handle hidden `free`. A future paired change in the forwarding header will leverage this capability in order to handle certain special cases where it is not possible / desirable to allocate via the HIP runtime.

- Redefines `DXILAttribute` to denote a function attribute, compatible to how it was define in DXC/LLVM 3.7 - Fix how `DXILAttribute` is emitted to be a struct of set attributes instead of an "or" of the enums - Implement the lowering of `DXILAttribute` to LLVM function attributes in `DXILOpBuilder.cpp`. A custom mapping is defined. - Audit all current ops to specify the correct attributes consistent with DXC. This is done here to allow for testing. - Update testcases in `llvm/test/CodeGen/DirectX` of all ops with attributes to match that attributes are set - Update testcases of ops that had previously incorrectly set attributes to check there is no attributes set - Defines `DXILProperty` to denote the other type of attributes from DXC used to query properties. - Emit `DXILProperty` as a struct of set attributes. - Updates `DXIL.td` to specify applicable `DXILProperty`s on ops Note: `DXILProperty` was referred to as 'queryable attributes' in design discussion. Changed to property to allow for better expression in `DXIL.td` Resolves #114461 Resolves #115912

This patch makes the build container job save the agent container image to a separate tar file rather than bundling it in with the existing tar file. For some reason, running podman save with two container images and then loading that single tar file gets rid of the agent image and we end up with two copies of the original image. This means that premerge jobs will fail with the agent image because they cannot find the run.sh script.

… others. NFC Tablegen is missing a check that should have caught that these were duplicated. I'm working to restore that check.

It used to early return when destination is same as origin. But it's redundant because in that case the callback won't get called in the first place.

Add register class bit width for SReg_256_XNULL and SReg_128_XNULL

Debugging the dropped variable statistics for large LLVM IR files can be made easier if the functions called before an optimization pass is run includes the PassID of the pass that will be run after statistics metrics are collected. This patch adds that support.

This reverts commit 106f105. Everything should be in working order now that the container job has been updated to work properly. This has been tested on an individual job.

Reverts #123132

This commit adds support for linker script unary plus ('+') operator. It is helpful for improving compatibility between LLD and GNU LD. Closes #118047

add commute for some VOP3 inst, allow commute for both inline constant operand, adjust tests Fixes #111205

This patch fixes: llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:2792:14: error: comparison of integers of different signs: 'unsigned int' and 'int' [-Werror,-Wsign-compare] llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:2797:14: error: comparison of integers of different signs: 'unsigned int' and 'int' [-Werror,-Wsign-compare]

…#123858) See https://cmake.org/cmake/help/latest/policy/CMP0175.html > The `OUTPUT` form does not accept `PRE_BUILD`, `PRE_LINK`, or `POST_BUILD` keywords. When using CMake version 3.31+, this results in ~2000 lines of warning spam in my downstream project: ``` CMake Warning (dev) at build-gcc/lib/cmake/mlir/AddMLIRPython.cmake:606 (add_custom_command): The following keywords are not supported when using add_custom_command(OUTPUT): PRE_BUILD. Policy CMP0175 is not set: add_custom_command() rejects invalid arguments. Run "cmake --help-policy CMP0175" for policy details. Use the cmake_policy command to set the policy and suppress this warning. Call Stack (most recent call first): build-gcc/lib/cmake/mlir/AddMLIRPython.cmake:222 (add_mlir_python_sources_target) build-gcc/lib/cmake/mlir/AddMLIRPython.cmake:256 (_process_target) compiler/bindings/python/CMakeLists.txt:239 (add_mlir_python_modules) This warning is for project developers. Use -Wno-dev to suppress it. ``` General docs: https://cmake.org/cmake/help/latest/command/add_custom_command.html. Note that `PRE_BUILD` only appears in the _second_ signature for the function (which takes `TARGET`) not the first (which takes `OUTPUT`).

…e instantiation definition (#123871) Close #123719 The reason is, we thought the external explicit template instantiation declaration as the external definition incorrectly.

pull bot added the ⤵️ pull label Jan 16, 2025

necto and others added 29 commits January 20, 2025 10:02

[clang] Fix false warning on reinterpret_casting unknown template type (

d049db8

#109430) After 1595988 diag::warn_undefined_reinterpret_cast started raising on non-instantiated template functions without sufficient knowledge whether the reinterpret_cast is indeed UB.

[mlir] Add [[lifetimebound]] to Range classes. (#123091)

1297c11

This prevents creating range class instances from temporaries.

[Offload] Enable shared-libs; compiler-rt as default RTLIB (#123568)

4b3c178

This is the next step to move the CMake cache file builder closer to the build configuration we care about downstream.

[clang][bytecode] Fix diagnostic mismatch with current interpreter (#…

1be64c2

…123571) Don't report dead pointers if we've checking for a potential constant expression.

[LLParser] Avoid PointerType::get() with type argument (NFC)

5139c90

Use the methods accepting LLVMContext instead.

[clang][NFC] Migrate bit-fields of OverloadCandidate to LLVM_PREFERRE…

5a7a324

…D_TYPE

[LLD][COFF] Add support for alternate entry point in CHPE metadata on…

a16adaf

… ARM64X (#123346) Includes handling for ARM64X relocations relative to a symbol.

[bazel] Fix dependencies for 69d3ba3

0f8297a

[AMDGPU][NewPM] Port SIOptimizeExecMasking to NPM (#123572)

96c4f97

[SPIR-V] Improve portability of the code (#123584)

fe7cb15

Adding SPIRV to LLVM_ALL_TARGETS (#119653) revealed a series of minor compilation problems and sanitizer complaints. This PR is to address the problem.

[libclc] Rename include guards. NFC.

8b7bfb4

[CodeGen] MachineFunctionSplitter: Add missing initializer (#123564)

3ace18d

This registers the pass with PassRegistry so we can use -start-before and other options for machine-function-splitter.

[Flang] Modify module test to run in a sub-directory (#123364)

4d21096

This is to avoid race conditions with other tests.

[Clang] Use more liberal pointer attribute wildcard in ms-intrinsics …

bd96295

…tests (NFC) Allow arbitrary attributes, including those with arguments.

[Clang] Don't match irrelevant attributes in mips return tests (NFC)

a4d9a8d

The only thing these tests care about from an ABI perspective is sret, don't also test all the optimization attributes.

[Clang] Regenerate test checks (NFC)

a79ae86

To reduce diffs in an upcoming change.

[flang][debug] Remove an unused function to fix build. (#123602)

0ec153b

[clang][bytecode] Fix reporting failed local constexpr initializers (#…

d70f54f

…123588) We need to emit the 'initializer of X is not a constant expression' note for local constexpr variables as well.

[NVPTX][DEBUGINFO][NFC]Reduce test file to ease maintenance

1c5b122

[mlir][emitc] Set default dialect to emitc in ops with block (#123036)

977d744

This is a follow up to 68a3908 (func: Set default dialect to 'emitc'), but for other instructions with blocks to make it look consistent.

[AArch64][NFC] Move getPartialReductionCost into cpp file (#123370)

a733c1f

The function getPartialReductionCost is already quite large and is likely to grow in size as we add support for more cases in future. Therefore, I think it's best to move this into the cpp file.

inbelic and others added 30 commits January 21, 2025 15:12

Revert "[HLSL] Implement the reflect HLSL function" (#123846)

4c91263

Reverts #122992 Due to an included failing test-case the commit causes build failures.

Remove references to MipsAndroid as it is unsupported (#123810)

b821aae

Remove check for Android in Mips.cpp (#123793)

635e154

[bazel] Add SPIRV

68fdc09

[bazel] Reorder targets

ec0958c

[ORC] Simplify error return. NFC.

8a5f1ef

[clang] Implement #pragma clang section on COFF targets (#112714)

6ab9daf

This patch implements the directive #pragma clang section on COFF targets with the exact same features available on ELF and Mach-O.

[mlir][Vector][NFC] Add vector-transform-options flag to ConvertVec…

d25a1f8

…torToLLVMPass (#123491) This flag enables the configuration of some transformation such as the lowering of contractions and transposes. The default configuration preserves the existing behavior.

[flang][cuda] Allocate descriptor in managed memory when memref is a …

c26e1a2

…block argument (#123829)

[flang][cuda] Fix link issue after c26e1a2

4280316

[RISCV] Remove some unnecessary UnsupportedSched* that are implied by…

9883aa9

… others. NFC Tablegen is missing a check that should have caught that these were duplicated. I'm working to restore that check.

[SandboxVec][DAG][NFC] Remove early return in notifyMoveInstr()

4089314

It used to early return when destination is same as origin. But it's redundant because in that case the callback won't get called in the first place.

[AMDGPU] Fix unreachable reg bit width (#122107)

e8811ad

Add register class bit width for SReg_256_XNULL and SReg_128_XNULL

Revert "[CI][Github] Disable premerge workflow on main"

ee5d600

This reverts commit 106f105. Everything should be in working order now that the container job has been updated to work properly. This has been tested on an individual job.

Revert "[LLDB] Add draft docstrings for SBSaveCoreOptions" (#123873)

18ee7e1

Reverts #123132

[LLD] [ELF] Add support for linker script unary plus operator (#121508)

8c2030b

This commit adds support for linker script unary plus ('+') operator. It is helpful for improving compatibility between LLD and GNU LD. Closes #118047

[AMDGPU] Add commute for some VOP3 inst (#121326)

7c58d63

add commute for some VOP3 inst, allow commute for both inline constant operand, adjust tests Fixes #111205

[C++20] [Modules] Make sure vtable are generated for explicit templat…

05861b3

…e instantiation definition (#123871) Close #123719 The reason is, we thought the external explicit template instantiation declaration as the external definition incorrectly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] main from llvm:main #5546

[pull] main from llvm:main #5546

pull bot commented Jan 16, 2025 •

edited

Loading

[pull] main from llvm:main #5546

Are you sure you want to change the base?

[pull] main from llvm:main #5546

Conversation

pull bot commented Jan 16, 2025 • edited Loading

pull bot commented Jan 16, 2025 •

edited

Loading