-
Notifications
You must be signed in to change notification settings - Fork 661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not approximate erf on rocm. #19969
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stamping, but please address comments
compiler/src/iree/compiler/Codegen/Common/MathTransformPass.cpp
Outdated
Show resolved
Hide resolved
@MaheshRavishankar , the PkgCI failure on SDXL is minimized to the following testcase. Summary: Testcase: func.func @testcase(
%10 : tensor<64x64xf32>, %11 : tensor<64x64xf32>
) -> tensor<64x64xf32> {
%cst = arith.constant 0.000000e+00 : f32
%12 = tensor.empty() : tensor<64x64xf32>
%16 = linalg.fill ins(%cst : f32) outs(%12 : tensor<64x64xf32>) -> tensor<64x64xf32>
%17 = linalg.matmul ins(%10, %11 : tensor<64x64xf32>, tensor<64x64xf32>) outs(%16 : tensor<64x64xf32>) -> tensor<64x64xf32>
%18 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0, d1)>], iterator_types = ["parallel", "parallel"]} ins(%17 : tensor<64x64xf32>) outs(%12 : tensor<64x64xf32>) {
^bb0(%in: f32, %out: f32):
%23 = math.erf %in : f32
linalg.yield %23 : f32
} -> tensor<64x64xf32>
return %18 : tensor<64x64xf32>
} Compile (with this PR applied, so that ROCm implies preserving math.erf so that it is not approximated):
Result: failure after ConvertToLLVM with IR like: %436 = "math.erf"(%83) <{fastmath = #arith.fastmath<none>}> : (vector<4x1x1x1x4x1xf32>) -> vector<4x1x1x1x4x1xf32>
%437 = "builtin.unrealized_conversion_cast"(%436) : (vector<4x1x1x1x4x1xf32>) -> !llvm.array<4 x array<1 x array<1 x array<1 x array<4 x vector<1xf32>>>>>> If I drop the This sounds like it is complaining that |
Interesting. I think it just worked by chance earlier. Havent looked deeper, but it might be that the lowering to llvm for math.erf supports vector types. Earlier the math.erf operations
(2) is being fixed by #19987 so chances are you would have hit this issue after that PR anyway. |
@MaheshRavishankar , unit dims strike again. Without the
And ConvertToROCDL (which includes ConvertToLLVM) gives:
With the
and ConvertToROCDL says "unit dims!!! run for your lives!!" |
The problem is here: Here By contrast, when it works (without the linalg.matmul) it sees So it seems that the unit dims are not the immediate problem here (or they might be indirectly). Rather, the problems is that the vector type Might be some misuse of TypeConverter? |
@bjacob I don't think that's a misuse of the type converter - it's just code that assumes that you'll be using 1D vectors specifically and wasn't written to support multi-dimensional vectors ... probably because no one's complained about that support being missing. |
Signed-off-by: Benoit Jacob <[email protected]>
Signed-off-by: Benoit Jacob <[email protected]>
95b4938
to
4c858bb
Compare
@qedawkins @MaheshRavishankar , the new commit 56aa457 fixes it in ConvertToROCDL, like we had discussed, minimally, just calling
|
Signed-off-by: Benoit Jacob <[email protected]>
4c858bb
to
56aa457
Compare
Looking at the regression tests you're getting a lot of failures, which doesn't really surprise me. We probably need a different solution.
I would lean towards keeping ConvertToROCDL self contained and run it as a separate pass, but given the current state of ConvertToROCDL I wouldn't block.
If this works better (e.g. in benchmarks/tests) then yes, I'd run the whole thing.
No need to worry about NVVM. |
I had a branch that tried to add the unit dim dropping patterns to TIleAndFuse but I couldn't get it to work well and dropped it: https://github.com/qedawkins/iree/tree/tile_and_fuse_improvements |
Some of the CI failures were an outage. Retriggered after the CI hosts were rebooted, narrowed it down to real failures on SDXL. Reproduced locally, got this IR: %11630 = "builtin.unrealized_conversion_cast"(%11629) : (!llvm.array<4 x vector<4xf32>>) -> vector<4x4xf32>
%11631 = "math.erf"(%11630) <{fastmath = #arith.fastmath<none>}> : (vector<4x4xf32>) -> vector<4x4xf32>
%11632 = "builtin.unrealized_conversion_cast"(%11631) : (vector<4x4xf32>) -> !llvm.array<4 x vector<4xf32>> So now this isn't unit dims anymore. We need a flattening pattern for element-wise ops that flattens all dims, not just unit dims. @qedawkins @MaheshRavishankar |
That is a significantly broader change than initially thought but is something we've known for a long time (giving llvm big arrays is probably not best), but have been kicking the can down the road. This may be the tipping point, but I'd advocate for finding a temporary way to handle multi dimensional cases if we want to land this soon. |
How about adding a shapecast rewrite in MathTransforms, where we already do math-function-specific rewrites. It won't be as general is it could be (in principle it should apply to other elementwise ops) but it will then by construction be as specific as we need it to be to be able to land this in the near term. For example, we will be able to default to doing this only for |
That could work.. could you post a bit more of the IR from the error.... I dont think we should even have had a vector of 4x4 at this level. |
On ROCm, we want to use the device library functions, which we link as bitcode and inline. In this PR, we start with
math.erf
because that's the immediate use case, but this will likely be generalized to other functions in a subsequent PR.