forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature/fused-ops #55
Draft
mgehre-amd
wants to merge
10,000
commits into
main
Choose a base branch
from
feature/fused-ops
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…mission of the OpGroupBroadcast instruction (llvm#103050) This PR addresses a TODO in lib/Target/SPIRV/SPIRVInstructionSelector.cpp by adding implementation of the non-const G_BUILD_VECTOR, and fix emission of the OpGroupBroadcast instruction for the case when the `..._group_broadcast` builtin has more than one `local_id` argument and `OpGroupBroadcast` requires a newly constructed vector with 2 or 3 components instead of originally passed series of `local_id` arguments. This PR may resolve llvm#97310 if the reason for the reported fail is an incorrectly generated OpGroupBroadcast instruction that was definitely a case. Existing test is hardened and a new test is added to cover this special case of the OpGroupBroadcast instruction emission.
llvm#101449) …ImplID This patch 1. remove the vendorId from `__riscv_vendor_feature_bits` 2. Define a new structure for vendorID, ArchID and ImplID 3. Update the relate init code
This reverts commit 3cab7c5. The modified test fails on ppc64le buildbots.
Split out from llvm#98608.
…lvm#101827) Adding a pass that is expected to run after the deallocation pipeline and will move buffer deallocations right after their last user or dependency, thus optimizing the allocation liveness.
This also adds a default constructor and a few uses of it.
Currently the formatter only runs for the main branch, which prevents the formatter from running for stacked PRs, which have to target user branches instead of main.
[AutoBump] Merge with d99bb01 (3)
[AutoBump] Merge with 9997e03 (5)
Bump with conflict resolution (6)
Bump with conflict resolution to 2e271ce (1)
Bump to 2e271ce (needs onnx-mlir update) (2)
Bump to fe2119a with conflict resolution (1)
Bump (needs onnx-mlir update) (2)
Merge with fixes of 647d75d (4)
[AutoBump] Merge with fixes of 9811971 (Aug 14) (3)
[FXML-5083] SCFToEmitC: use already lowered operands
There was code to suppress printing semicolons after emitc.verbatim in the function that emits a function body, but then it would still print `#pragma;` when the `emitc.verbatim "#pragma"` was within a loop body. I moved that code into the general printOperation() function, so it applies to emitc.verbatim independent of what the parent op is.
emitc: Add fmtArgs to verbatim
Don't print semicolon after emitc.verbatim within emitc.for
With the previous parsing, it would interpret ``` emitc.verbatim "#endif // PL_USE_XRT" %4 = "emitc.constant"() <{value = 1 : i32}> : () -> i32 ``` as if ``` emitc.verbatim "#endif // PL_USE_XRT" %4 = "emitc.constant"() <{value = 1 : i32}> : () -> i32 ``` and then complain that it expected a `:` after the `%4`. Fix this by introducing a `args` keyword to distinguish the case where the veratim has args from the case where the next operation starts.
Fix verbatim parsing to be unambiguous
feat: implement constant folding for tosa.slice
When deciding whether to emit a map like `#map = affine_map<(d0, d1, d2, d3) -> (0, d1, d2, d3)>` or `#map = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>` for and operand of a linalg.generic when lowering element wise TOSA ops, prefer the latter unless broadcasting of the operand is really needed. This helps later transformations which often require the affine map to be a projected permuatation, which only the latter is.
* Fix for aliasing the region args * Add test case * Add empty line
…ered version (Emitc::ForOp) (#390)
OpaqueType: Use format string
Refactored @Max191's PR llvm#94637 to move it to `Tensor` From the original PR >This PR adds fusion by expansion patterns to push a tensor.expand_shape up through a tensor.collapse_shape with non-intersecting reassociations. Sometimes parallel collapse_shape ops like this can block propagation of expand_shape ops, so this allows them to pass through each other. I'm not sure if I put the code/tests in the right places, so let me know where those go if they aren't. cc @MaheshRavishankar @hanhanW --------- Co-authored-by: Max Dawkins <[email protected]>
Add missing `getIterationDomainTileFromOperandTile` and `getTiledImplementationFromOperandTile` to `tensor.pack` and enable fusing it as a consumer. NOTE that, it only expects perfect tiling scenario without padding semantic currently.
…#96184) In order to support arbitrary size input data of conv2d, implement TilingInterface for winograd operations. Before converting winograd operations into nested loops with matrix multiply, tile the input of conv2d into the supported size first. Add a transform operation structured.decompose_winograd_op to decompose winograd operations. Before applying the transform op, use tile_using_for to tile the input data into supported size. The test case shows how to tile and decompose winograd operations.
…to continue tile + fuse. (llvm#107882) Current implementation of `scf::tileConsumerAndFuseProducerUsingSCF` looks at operands of tiled/tiled+fused operations to see if they are produced by `extract_slice` operations to populate the worklist used to continue fusion. This implicit assumption does not always work. Instead make the implementations of `getTiledImplementation` return the slices to use to continue fusion. This is a breaking change - To continue to get the same behavior of `scf::tileConsumerAndFuseProducerUsingSCF`, change all out-of-tree implementation of `TilingInterface::getTiledImplementation` to return the slices to continue fusion on. All in-tree implementations have been adapted to this. - This change touches parts that required a simplification to the `ControlFn` in `scf::SCFTileAndFuseOptions`. It now returns a `std::optional<scf::SCFTileAndFuseOptions::ControlFnResult>` object that should be `std::nullopt` if fusion is not to be performed. Signed-off-by: MaheshRavishankar <[email protected]>
…m#109554) The SCF helper for tiling an operation implementing the TilingInterface and greedily fusing consumers requires an uninterrupted chain of operations implementing the tiling interface to succeed. There can be cases with intermediate ops that don't implement the interface but have producers that could be fused if various canonicalization/simplification patterns could run in between fusion steps. This adds an option to SCFTileAndFuseOptions for a pattern set to run between fusion steps to the ops that result from fusion/tiling. Removed and newly inserted slices are tracked for continued fusion applications. See this RFC for more discussion: https://discourse.llvm.org/t/rfc-split-fusion-portions-of-the-tilinginterface-into-a-new-interface/81155
Add emitc.tu
The auto-generated builder created an emitc.tu that had an empty region. This is a bit cumbersome to work with, as you would always manually needed to create a block in it. Do what ModuleOp::build does and always create that block. Also accept StringRef as argument for id instead of requiring a StringAttr.
`#include` make sense everywhere, and in particular we need to allow them inside a `emitc.tu`. But sometimes we might even want to have an `#include` in a function body.
emitc.include: don't require the parent to be a ModuleOp
emitc.tu: Automatically create block for body
…ape_fold fix: fuse locations of double reshapes when folding.
Backport various improvements to fusion from upstream
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Not for merging; just convenience to look at our changes.