Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Enable ukernels on remaining aarch64 targets (#19901)
As newer aarch64 targets increasingly support SVE and SME, this clause was preventing ukernels from being used in cases where they do speed things up. The reason why this logic was out of place here is that what it controls here is the enablement of ukernels, which are a detail of lowering an already tiled workload. If we wanted to use SVE with a variable vector length, or with a fixed vector length different from NEON's 128bit, that decision needed to be made earlier; conversely, if the workload at this point already has the right shaped to be matched to a NEON ukernel, then SVE is not relevant to it anymore. FYI @ziereis , this results in substantially faster code in your test case from #19873. Signed-off-by: Benoit Jacob <[email protected]>
- Loading branch information