forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 2
[pull] main from llvm:main #5556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
pull
wants to merge
7,090
commits into
Ericsson:main
Choose a base branch
from
llvm:main
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Same purpose as #141407, comitting this directly to get the bot green sooner. Co-authored-by: Ely Ronnen <[email protected]>
## Why In https://github.com/llvm/llvm-project/pull/113612/files#diff-ada12e18f3e902b41b6989b46455c4e32656276e59907026e2464cf57d10d583, the parameter `qual_name` was introduced. However, the tests have not been adopted accordingly and hence cannot be executed. ## What Fix the execution of tests by providing the missing argument.
This patch is part of a stack that teaches Clang to generate Key Instructions metadata for C and C++. RFC: https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668 The feature is only functional in LLVM if LLVM is built with CMake flag LLVM_EXPERIMENTAL_KEY_INSTRUCTIONs. Eventually that flag will be removed.
It's already called in llvm_add_library.
This patch updates `CombineContractBroadcastMask` to inherit from `MaskableOpRewritePattern`, enabling it to handle masked `vector.contract` operations. The pattern rewrites: ```mlir %a = vector.broadcast %a_bc %res vector.contract %a_bc, %b, ... ``` into: ```mlir // Move the broadcast into vector.contract (by updating the indexing // maps) %res vector.contract %a, %b, ... ``` The main challenge is supporting cases where the pattern drops a leading unit dimension. For example: ```mlir func.func @contract_broadcast_unit_dim_reduction_masked( %arg0 : vector<8x4xi32>, %arg1 : vector<8x4xi32>, %arg2 : vector<8x8xi32>, %mask: vector<1x8x8x4xi1>) -> vector<8x8xi32> { %0 = vector.broadcast %arg0 : vector<8x4xi32> to vector<1x8x4xi32> %1 = vector.broadcast %arg1 : vector<8x4xi32> to vector<1x8x4xi32> %result = vector.mask %mask { vector.contract { indexing_maps = [#map0, #map1, #map2], iterator_types = ["reduction", "parallel", "parallel", "reduction"], kind = #vector.kind<add> } %0, %1, %arg2 : vector<1x8x4xi32>, vector<1x8x4xi32> into vector<8x8xi32> } : vector<1x8x8x4xi1> -> vector<8x8xi32> return %result : vector<8x8xi32> } ``` Here, the leading unit dimension is dropped. To handle this, the mask is cast to the correct shape using a `vector.shape_cast`: ```mlir func.func @contract_broadcast_unit_dim_reduction_masked( %arg0: vector<8x4xi32>, %arg1: vector<8x4xi32>, %arg2: vector<8x8xi32>, %arg3: vector<1x8x8x4xi1>) -> vector<8x8xi32> { %mask_sc = vector.shape_cast %arg3 : vector<1x8x8x4xi1> to vector<8x8x4xi1> %res = vector.mask %mask_sc { vector.contract { indexing_maps = [#map, #map1, #map2], iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add> } %arg0, %arg1, %mask_sc : vector<8x4xi32>, vector<8x4xi32> into vector<8x8xi32> } : vector<8x8x4xi1> -> vector<8x8xi32> return %res : vector<8x8xi32> } ``` While this isn't ideal - since it introduces a `vector.shape_cast` that must be cleaned up later - it reflects the best we can do once the input reaches `CombineContractBroadcastMask`. A more robust solution may involve simplifying the input earlier. I am leaving that as a TODO for myself to explore this further. Posting this now to unblock downstream work. LIMITATIONS Currently, this pattern assumes: * Only leading dimensions are dropped in the mask. * All dropped dimensions must be unit-sized.
This patch is part of a stack that teaches Clang to generate Key Instructions metadata for C and C++. RFC: https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668 The feature is only functional in LLVM if LLVM is built with CMake flag LLVM_EXPERIMENTAL_KEY_INSTRUCTIONs. Eventually that flag will be removed.
This patch is part of a stack that teaches Clang to generate Key Instructions metadata for C and C++. RFC: https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668 The feature is only functional in LLVM if LLVM is built with CMake flag LLVM_EXPERIMENTAL_KEY_INSTRUCTIONs. Eventually that flag will be removed.
…138540) When determining whether an escape source may alias with a noalias argument, only take provenance captures into account. If only the address of the argument was captured, an access through the escape source is not legal.
Fixes errors about duplicate PHI edges when the input had duplicates with constexprs in them. The constexpr translation makes new basic blocks, causing the verifier to complain about duplicate entries in PHI nodes.
…test (#141503) This is a NFC change. Added "-mattr=-real-true16" to a few gfx12 tests. This is for the up coming GFX12 true16 code change. Set these tests to use fake16 flow since true16 mode are not fully functional for GISEL
- No need to prefix `PointerType` with `llvm::`. - Avoid namespace block to define `PrintPipelinePasses`.
#140132) Update initial construction to connect the Plan's entry to the scalar preheader during initial construction. This moves a small part of the skeleton creation out of ILV and will also enable replacing VPInstruction::ResumePhi with regular VPPhi recipes. Resume phis need 2 incoming values to start with, the second being the bypass value from the scalar ph (and used to replicate the incoming value for other bypass blocks). Adding the extra edge ensures we incoming values for resume phis match the incoming blocks. PR: #140132
asin, acos, atan, and atan2 were being lowered to libm calls instead of llvm intrinsics. Add the conversion patterns to handle these intrinsics and update tests to expect this.
They're just aliases.
… (NFC) (#141595) The `GCNScheduleDAGMILive`'s `RescheduleRegions` bitvector is only used by the rematerialization stage (`PreRARematStage`). Its presence in the scheduler's state forces us to maintain its value throughout scheduling even though it is of no use to the iterative scheduling process itself, which instead relies on each stage's `initGCNRegion` hook to determine whether the current region should be rescheduled. This moves the bitvector to the `PreRARematStage`, which uses it to store the set of regions that must be rescheduled between stage initialization and region initialization. This NFC also swaps a call to `GCNRegPressure::getArchVGPRNum(false)` for a call to `GCNRegPressure::getArchVGPRNum()`---which is equivalent but simpler in the context---and makes `GCNSchedStage::finalizeGCNRegion` use its own API to advance to the next region.
In `applyAdrpAddLdr()` we make a transformation that is identical to the one in `applyAdrpAdd()`, so lets reuse that code. Also refactor `forEachHint()` to use more `ArrayRef` and move around some lines for consistancy.
This addresses a TODO where previously scalarizeBinopOrCmp conservatively bailed if one of the operands was a load. getVectorInstrCost was updated to take in values in https://reviews.llvm.org/D140498 so we can pass in the scalar value to be inserted, which should return an accurate cost for a gather. To prevent regressions on x86 this tries to constant fold NewVecC up front so we can pass it into TTI and get a more accurate cost. We want to remove this restriction on RISC-V since this is always profitable whether or not the scalar is a load.
We were emitting a non-error diagnostic but claiming we emitted an error, which caused some obvious follow-on problems. Fixes #141549
) Not explicitly defining the default case for ShallowCopy* functions does not meet the requirements for gcc to actually instantiate the templates, leading to build errors that show up with gcc but not with clang. Signed-off-by: Kajetan Puchalski <[email protected]>
This patch moves scalable vectorization tests into an existing generic vectorization test file: * vectorization-scalable.mlir --> merged into vectorization.mlir Rationale: * Most tests in vectorization-scalable.mlir are variants of existing tests in vectorization.mlir. Keeping them together improves maintainability. * Consolidating tests makes it easier to spot gaps in coverage for regular vectorization. * In the Vector dialect, we don't separate tests for scalable vectors; this change aligns Linalg with that convention. Notable changes beyond moving tests: * Updated one of the two matrix-vector multiplication tests to use `linalg.matvec` instead of `linalg.generic`. CHECK lines remain unchanged. * Simplified the lone `linalg.index` test by removing an unnecessary `tensor.extract`. Also removed canonicalization patterns from the TD sequence for consistency with other tests. This patch contributes to the implementation of #141025 — please refer to that ticket for full context.
This implements the design proposed by [Representing SpirvType in Clang's Type System](llvm/wg-hlsl#181). It creates `HLSLInlineSpirvType` as a new `Type` subclass, and `__hlsl_spirv_type` as a new builtin type template to create such a type. This new type is lowered to the `spirv.Type` target extension type, as described in [Target Extension Types for Inline SPIR-V and Decorated Types](https://github.com/llvm/wg-hlsl/blob/main/proposals/0017-inline-spirv-and-decorated-types.md).
#139858) New contributors can just indicate that they are working on the issue without requesting assignment. That shouldd reduce the burden of assigned issues that are not actually being worked on, and new contributors waiting for a maintainer to asssign them the issue. --------- Co-authored-by: Danny Mösch <[email protected]>
This patch fixes: clang/utils/TableGen/ClangBuiltinTemplatesEmitter.cpp:25:8: error: 'llvm::StringSet' may not intend to support class template argument deduction [-Werror,-Wctad-maybe-unsupported]
A dummy argument with an explicit INTEGER type of non-default kind can be forward-referenced from a specification expression in many Fortran compilers. Handle by adding type declaration statements to the initial pass over a specification part's declaration constructs. Emit an optional warning under -pedantic. Fixes #140941.
) When processing free form source line continuation, the prescanner treats empty keyword macros as if they were spaces or tabs. After skipping over them, however, there's code that only works if the skipped characters ended with an actual space or tab. If the last skipped item was an empty keyword macro's name, the last character of that name would end up being the first character of the continuation line. Fix.
This change uses resource name during DXIL resource binding analysis to detect when two (or more) resources have identical overlapping binding. The DXIL resource analysis just detects that there is a problem with the binding and sets the `hasOverlappingBinding` flag. Full error reporting will happen later in DXILPostOptimizationValidation pass (#110723).
implemented wcschr and tests --------- Co-authored-by: Sriya Pratipati <[email protected]>
Ensure everything is defined inside the namespace, reduce number of ifdefs.
…41719) Currently, to avoid generating too much BTF types, for a struct type like ``` struct foo { int val; struct bar *ptr; }; ``` if the BTF generation reaches 'struct foo', it will not generate actual type for 'struct bar' and instead a forward decl is generated. The 'struct bar' is actual generated in BTF unless it is reached through a non-struct pointer member. Such a limitation forces bpf developer to hack and workaround this problem. See [1] and [2]. For example in [1], we have ``` struct map_value { struct prog_test_ref_kfunc *not_kptr; struct prog_test_ref_kfunc __kptr *val; struct node_data __kptr *node; }; ``` The BTF type for 'struct node_data' is not generated. Note that we have a '__kptr' annotation. Similar problem for [2] with a '__uptr' annotation. Note that the issue in [1] has been resolved later but the hack in [2] is still needed. This patch relaxed the struct type (with struct pointer member) BTF generation if the struct pointer has a btf_type_tag annotation. [1] https://lore.kernel.org/r/[email protected] [2] https://lore.kernel.org/r/[email protected]
Tests using ObjC do not readily run on Linux.
Use the if statement with an initializer pattern that's very common in LLVM in SBTarget. Every time someone adds a new method to SBTarget, I want to encourage using this pattern, but I don't because it would be inconsistent with the rest of the file. This solves that problem by switching over the whole file.
…perands (#141845) As noted in #141821 (comment), whilst we currently constant fold intrinsics of fixed-length vectors via their scalar counterpart, we don't do the same for scalable vectors. This handles the scalable vector case when the operands are splats. One weird snag in ConstantVector::getSplat was that it produced a undef if passed in poison, so this also contains a fix by checking for PoisonValue before UndefValue.
Fixes the final reduction steps which were taken from an implementation of scan, not reduction, causing lanes earlier in the wave to have incorrect results due to masking. Now aligning more closely with triton implementation : triton-lang/triton#5019 # Hypothetical example To provide an explanation of the issue with the current implementation, let's take the simple example of attempting to perform a sum over 64 lanes where the initial values are as follows (first lane has value 1, and all other lanes have value 0): ``` [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] ``` When performing a sum reduction over these 64 lanes, in the current implementation we perform 6 dpp instructions which in sequential order do the following: 1) sum over clusters of 2 contiguous lanes 2) sum over clusters of 4 contiguous lanes 3) sum over clusters of 8 contiguous lanes 4) sum over an entire row 5) broadcast the result of last lane in each row to the next row and each lane sums current value with incoming value. 5) broadcast the result of the 32nd lane to last two rows and each lane sums current value with incoming value. After step 4) the result for the example above looks like this: ``` [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] ``` After step 5) the result looks like this: ``` [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] ``` After step 6) the result looks like this: ``` [4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] ``` Note that the correct value here is always 1, yet after the `dpp.broadcast` ops some lanes have incorrect values. The reason is that for these incorrect lanes, like lanes 0-15 in step 5, the `dpp.broadcast` op doesn't provide them incoming values from other lanes. Instead these lanes are provided either their own values, or 0 (depending on whether `bound_ctrl` is true or false) as values to sum over, either way these values are stale and these lanes shouldn't be used in general. So what this means: - For a subgroup reduce over 32 lanes (like Step 5), the correct result is stored in lanes 16 to 31 - For a subgroup reduce over 64 lanes (like Step 6), the correct result is stored in lanes 32 to 63. However in the current implementation we do not specifically read the value from one of the correct lanes when returning a final value. In some workloads it seems without this specification, the stale value from the first lane is returned instead. # Actual failing test For a specific example of how the current implementation causes issues, take a look at the IR below which represents an additive reduction over a dynamic dimension. ``` !matA = tensor<1x?xf16> !matB = tensor<1xf16> #map = affine_map<(d0, d1) -> (d0, d1)> #map1 = affine_map<(d0, d1) -> (d0)> func.func @only_producer_fusion_multiple_result(%arg0: !matA) -> !matB { %cst_1 = arith.constant 0.000000e+00 : f16 %c2_i64 = arith.constant 2 : i64 %0 = tensor.empty() : !matB %2 = linalg.fill ins(%cst_1 : f16) outs(%0 : !matB) -> !matB %4 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["parallel", "reduction"]} ins(%arg0 : !matA) outs(%2 : !matB) { ^bb0(%in: f16, %out: f16): %7 = arith.addf %in, %out : f16 linalg.yield %7 : f16 } -> !matB return %4 : !matB } ``` When provided an input of type `tensor<1x2xf16>` and values `{0, 1}` to perform the reduction over, the value returned is consistently 4. By the same analysis done above, this shows that the returned value is coming from one of these stale lanes and needs to be read instead from one of the lanes storing the correct result. Signed-off-by: Muzammiluddin Syed <[email protected]>
…ool (#140829) This consolidates some of the error handling around regex arguments to the tool, and sets up the APIs such that errors must be handled before their usage.
…141859) The context disambiguation code already emits remarks when hinting allocations (by adding hotness attributes) during cloning. However, we did not yet emit hints when applying the hotness attributes during building of the metadata (during matching and again after inlining). Add remarks when we apply the hint attributes for these non-context-sensitive allocations.
This change updates a few tests for global variable handling to also check classic codegen output so we can easily verify consistency between the two and will be alerted if the classic codegen changes. This was useful in developing forthcoming changes to global linkage handling.
While and->cmp->sel combines into and->mul may result in worse code on some targets, this combine should be uniformly beneficial. Proof: https://alive2.llvm.org/ce/z/MibAcN --------- Co-authored-by: Matt Arsenault <[email protected]> Co-authored-by: Yingwei Zheng <[email protected]>
This helps to disambiguate accesses in the caller and the callee after LLVM inlining in some apps. I did not see any performance changes, but this is one step towards enabling other optimizations in the apps that I am looking at. The definition of llvm.noalias says: ``` ... indicates that memory locations accessed via pointer values based on the argument or return value are not also accessed, during the execution of the function, via pointer values not based on the argument or return value. This guarantee only holds for memory locations that are modified, by any means, during the execution of the function. ``` I believe this exactly matches Fortran rules for the dummy arguments that are modified during their subprogram execution. I also set llvm.noalias and llvm.nocapture on the !fir.box<> arguments, because the corresponding descriptors cannot be captured and cannot alias anything (not based on them) during the execution of the subprogram.
llvm-diff shows no change to amdgcn--amdhsa.bc
When we're creating a Zilsd store from a split i64 value, check if both inputs are 0 and use X0_Pair instead of a REG_SEQUENCE.
- Attach the mappable interface to ClassType - Remove the TODO since derived type in descriptor can be handled as other descriptors
The current unwinding implementation on Haiku is messy and broken. 1. It searches weird paths for private headers, which is breaking builds in consuming projects, such as dotnet/runtime. 2. It does not even work, due to relying on incorrect private offsets. This commit strips all references to private headers and ports a working signal frame implementation. It has been tested against `tests/signal_unwind.pass.cpp` and can go pass the signal frame.
Summary: When there are overloaded C++ operators in the global namespace the AST node for these is not a `BinaryExpr` but a `CXXOperatorCallExpr`. Modify the uses to handle this case, basically just treating it as a binary expression with two arguments. Fixes #141085
The XAndesPerf extension includes unsigned bitfield extraction instruction `NDS.BFOZ`, which can extract the bits from LSB to MSB, places them starting at bit 0, and zero-extends the result. The testcase includes the three patterns that can be selected as unsigned bitfield extracts: `and`, `and+lshr` and `lshr+and`
…eduction and VPMulAccumulateReduction. (#113903) This patch implement the VPlan-based cost model for VPReduction, VPExtendedReduction and VPMulAccumulateReduction. With this patch, we can calculate the reduction cost by the VPlan-based cost model so remove the reduction costs in `precomputeCost()`. Ref: Original instruction based implementation: https://reviews.llvm.org/D93476
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.1)
Can you help keep this open source service alive? 💖 Please sponsor : )