Rewrite transpose/reshape/broadcast between pointwise/reduce operators #3978

pfultz2 · 2025-04-25T21:34:10Z

This will rewrite transpose/reshape/broadcast so they will not be between pointwise and reduction using the shape_transform_descriptor. This is similiar to the rewrite_reshapes that is used for pointwise/reduce fusion, however this is done before fusions in simplify_reshapes. It also supports more cases such as reduce->squeeze->pointwise.

This could be expanded in the future to support other operators like argmin/argmax/concat etc.

Copilot

Pull Request Overview

This PR rewrites the handling of transpose, reshape, and broadcast operations so that they are not inserted between pointwise and reduction operators. Key changes include:

New and expanded test cases covering various combinations of pointwise, reduce, squeeze, transpose, broadcast, and reshape operators.
Introduction of a new matcher (find_op_shape_transform_op) in simplify_reshapes.cpp to better handle shape transformations.
Updates to the shape_transform_descriptor implementation to support additional cases and the new broadcast flag.

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
test/simplify_reshapes_test.cpp	Added multiple test cases to exercise new shape transforms
src/simplify_reshapes.cpp	Introduced new matcher to catch additional shape transformations
src/shape_transform_descriptor.cpp	Updated rebase logic and subdimension handling with a broadcast flag
src/include/migraphx/shape_transform_descriptor.hpp	Updated method signatures to support broadcast option
src/include/migraphx/rewrite_reshapes.hpp	Minor adjustments with commented debug prints removed
src/include/migraphx/matcher.hpp	Updated matcher creation functions to use new names

Comments suppressed due to low confidence (1)

src/shape_transform_descriptor.cpp:178

Review the handling of subdimensions when 'broadcast' is true: currently, when the input dimension equals final_dim, subdimensions are only exposed if 'broadcast' is false. Confirm that skipping exposure in the broadcast case is intended and consider adding a clarifying comment to document this behavior.

if(dim == final_dim)
        {

migraphx-bot · 2025-05-29T02:02:32Z

Test	Batch	Rate new 472753	Rate old 05bf15	Diff	Compare
torchvision-resnet50	64	3,238.83	3,257.26	-0.57%	✅
torchvision-resnet50_fp16	64	6,910.28	6,911.36	-0.02%	✅
torchvision-densenet121	32	2,445.94	2,450.74	-0.20%	✅
torchvision-densenet121_fp16	32	4,216.35	4,205.47	0.26%	✅
torchvision-inceptionv3	32	1,618.75	1,628.71	-0.61%	✅
torchvision-inceptionv3_fp16	32	2,710.90	2,724.75	-0.51%	✅
cadene-inceptionv4	16	757.34	760.36	-0.40%	✅
cadene-resnext64x4	16	815.37	815.27	0.01%	✅
slim-mobilenet	64	7,434.34	7,478.40	-0.59%	✅
slim-nasnetalarge	64	216.67	209.72	3.31%	🔆
slim-resnet50v2	64	3,441.80	3,346.20	2.86%	✅
bert-mrpc-onnx	8	1,144.51	1,150.57	-0.53%	✅
bert-mrpc-tf	1	459.26	460.43	-0.25%	✅
pytorch-examples-wlang-gru	1	499.46	333.24	49.88%	🔆
pytorch-examples-wlang-lstm	1	454.87	471.70	-3.57%	🔴
torchvision-resnet50_1	1	820.08	798.42	2.71%	✅
cadene-dpn92_1	1	428.38	413.68	3.55%	🔆
cadene-resnext101_1	1	391.63	393.90	-0.57%	✅
onnx-taau-downsample	1	396.04	396.27	-0.06%	✅
dlrm-criteoterabyte	1	32.20	32.34	-0.44%	✅
dlrm-criteoterabyte_fp16	1	51.24	51.41	-0.34%	✅
agentmodel	1	10,374.86	10,451.11	-0.73%	✅
unet_fp16	2	59.65	59.51	0.23%	✅
resnet50v1_fp16	1	1,089.17	1,037.24	5.01%	🔆
resnet50v1_int8	1	1,072.65	1,057.15	1.47%	✅
bert_base_cased_fp16	64	1,163.59	1,175.90	-1.05%	✅
bert_large_uncased_fp16	32	356.20	358.18	-0.55%	✅
bert_large_fp16	1	199.80	200.60	-0.40%	✅
distilgpt2_fp16	16	2,242.36	2,240.05	0.10%	✅
yolov5s	1	545.33	536.12	1.72%	✅
tinyllama	1	43.64	43.85	-0.49%	✅
vicuna-fastchat	1	43.93	45.11	-2.61%	✅
whisper-tiny-encoder	1	419.43	419.52	-0.02%	✅
whisper-tiny-decoder	1	414.72	412.06	0.64%	✅
llama2_7b	1	18.96	19.13	-0.93%	✅
qwen1.5-7b	1	23.38	23.54	-0.68%	✅
phi3-3.8b	1	26.21	26.63	-1.57%	✅
mask-rcnn	1	1.99	12.80	-84.42%	🔴
llama3-8b	1	21.59	21.77	-0.80%	✅
whisper-large-encoder	1	10.17	10.22	-0.48%	✅
whisper-large-decoder	1	98.04	100.87	-2.81%	✅
mistral-7b	1	23.59	23.76	-0.73%	✅
FLUX.1-schnell	1	nan	765.45	nan%	❌
nan	nan	nan	nan	nan%	❌

This build is not recommended to merge 🔴

migraphx-bot · 2025-05-29T02:02:34Z

✅ bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

❌bert-mrpc-tf: ERROR - check error output

2025-05-28 19:32:07.612282: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1748478733.274427 181387 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 62973 MB memory: -> device: 0, name: AMD Instinct MI250X/MI250, pci bus id: 0000:32:00.0
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1748478734.182600 181387 mlir_graph_optimization_pass.cc:401] MLIR V1 optimization pass is not enabled
2025-05-28 19:32:24.130260: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-05-28 19:32:24.130304: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-05-28 19:32:24.130345: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-05-28 19:32:24.130383: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-05-28 19:32:24.130412: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-05-28 19:32:24.130452: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-05-28 19:32:24.130492: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-05-28 19:32:24.130717: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
2025-05-28 19:32:24.131774: E tensorflow/compiler/mlir/tools/kernel_gen/tf_framework_c_interface.cc:228] INTERNAL: Generating device code failed.
2025-05-28 19:32:24.132911: W tensorflow/core/framework/op_kernel.cc:1829] UNKNOWN: JIT compilation failed.
2025-05-28 19:32:24.132931: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous is aborting with status: UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
2025-05-28 19:32:24.132942: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous is aborting with status: UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
[[import/loss/output/_21]]
2025-05-28 19:32:24.132958: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 11217777527359497193
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1407, in _do_call
return fn(*args)
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1390, in _run_fn
return self._call_tf_sessionrun(options, feed_dict, fetch_list,
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1483, in _call_tf_sessionrun
return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
(0) UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
[[import/loss/output/_21]]
(1) UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 340, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 324, in main
y_out = sess.run(y, feed_dict=tf_dict)
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 977, in run
result = self._run(None, fetches, feed_dict, options_ptr,
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1220, in _run
results = self._do_run(handle, final_targets, final_fetches,
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1400, in _do_run
return self._do_call(_run_fn, feeds, fetches, targets, options,
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1426, in _do_call
raise type(e)(node_def, op, message) # pylint: disable=no-value-for-parameter
tensorflow.python.framework.errors_impl.UnknownError: Graph execution error:

Detected at node 'import/bert/embeddings/LayerNorm/moments/SquaredDifference' defined at (most recent call last):
Node: 'import/bert/embeddings/LayerNorm/moments/SquaredDifference'
Detected at node 'import/bert/embeddings/LayerNorm/moments/SquaredDifference' defined at (most recent call last):
Node: 'import/bert/embeddings/LayerNorm/moments/SquaredDifference'
2 root error(s) found.
(0) UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
[[import/loss/output/_21]]
(1) UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'import/bert/embeddings/LayerNorm/moments/SquaredDifference':

✅ pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

✅ pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

✅ dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

✅ agentmodel: PASSED: MIGraphX meets tolerance

✅ unet: PASSED: MIGraphX meets tolerance

✅ resnet50v1: PASSED: MIGraphX meets tolerance

✅ bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

✅ bert_large: PASSED: MIGraphX meets tolerance

✅ yolov5s: PASSED: MIGraphX meets tolerance

✅ tinyllama: PASSED: MIGraphX meets tolerance

✅ vicuna-fastchat: PASSED: MIGraphX meets tolerance

✅ whisper-tiny-encoder: PASSED: MIGraphX meets tolerance

✅ whisper-tiny-decoder: PASSED: MIGraphX meets tolerance

🔴distilgpt2_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

❌llama2_7b: ERROR - check error output

Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 340, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 227, in main
model.compile(
RuntimeError: /src/AMDMIGraphX/src/include/migraphx/check_shapes.hpp:220: same_dims: less: Dimensions do not match

✅ qwen1.5-7b: PASSED: MIGraphX meets tolerance

✅ phi3-3.8b: PASSED: MIGraphX meets tolerance

🔴mask-rcnn: FAILED: MIGraphX is not within tolerance - check verbose output

❌llama3-8b: ERROR - check error output

Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 340, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 227, in main
model.compile(
RuntimeError: /src/AMDMIGraphX/src/include/migraphx/op/multibroadcast.hpp:77: compute_shape: MULTIBROADCAST: input dimensions should <= output size

❌whisper-large-decoder: ERROR - check error output

Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 340, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 227, in main
model.compile(
RuntimeError: /src/AMDMIGraphX/src/include/migraphx/check_shapes.hpp:220: same_dims: less: Dimensions do not match

❌mistral-7b: ERROR - check error output

Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 340, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 227, in main
model.compile(
RuntimeError: /src/AMDMIGraphX/src/include/migraphx/check_shapes.hpp:220: same_dims: less: Dimensions do not match

❌FLUX.1-schnell: ERROR - check error output

pfultz2 added 26 commits April 21, 2025 08:19

Use shape transform desc to simplify reshapes between other operators

e9761b1

Format

e7ce9ef

Handle layout

4fb8b8b

Format

d0180f7

Support multi used inputs

a033df3

Foramt

f8f5a9b

Dont remove broadcast info when rebasing

3383d2c

Format

0c886e3

Add test for used twice

531310b

Format

1a6937f

Add another test

4dbfb6d

Format

4c3c97a

Fix failure

cdceaff

Format

e105e93

Fix some more tests

943e38c

Format

a1b9491

Add comment

e60a219

Conditionally expose

cceff6b

Format

8ba356f

Conditionally expose

8c369ff

Format

1a1f45d

Add flag to broadcast

1b7e8bf

Format

9b86e69

Update tests

fa4c1a4

Format

fdefd5a

Update more tests

8d9bc10

pfultz2 requested a review from causten as a code owner April 25, 2025 21:34

pfultz2 requested review from Copilot and removed request for causten April 25, 2025 21:34

Copilot AI reviewed Apr 25, 2025

View reviewed changes

pfultz2 added 27 commits May 3, 2025 10:04

Format

637bca0

Dont pass desc as parameter

d21d97a

Format

6de6644

Fix handling of scalars

670c695

Format

e4c046c

Fix another test case

1074f4d

Format

faa53af

Add another test

876d0da

Format

76eb464

Save work

ff00aa5

Format

0edc99f

Update tests

edb2abe

Fix last test

5cc8f84

Some cleanup

0c80d3c

Update methods in the tests

275b4ca

Format

54cb6c7

Use new methods directly

040d7d9

Format

b745810

Use new methods in rewrite_reshapes

635e836

Format

cddbe79

Remove the old methods

e5bad47

Remove get_len refactor

cd4301c

Format

556baf7

Fix cppcheck

952c665

Format

f64dd18

Fix non-scalar test case

018c183

Format

472753e

pfultz2 self-assigned this May 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rewrite transpose/reshape/broadcast between pointwise/reduce operators #3978

Rewrite transpose/reshape/broadcast between pointwise/reduce operators #3978

pfultz2 commented Apr 25, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

migraphx-bot commented May 29, 2025

Uh oh!

migraphx-bot commented May 29, 2025

Uh oh!

Uh oh!

Rewrite transpose/reshape/broadcast between pointwise/reduce operators #3978

Are you sure you want to change the base?

Rewrite transpose/reshape/broadcast between pointwise/reduce operators #3978

Conversation

pfultz2 commented Apr 25, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

migraphx-bot commented May 29, 2025

Uh oh!

migraphx-bot commented May 29, 2025

Uh oh!

Uh oh!