creating a new env var section #3994

spolifroni-amd · 2025-05-06T21:31:24Z

No description provided.

codecov · 2025-05-06T22:48:21Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #3994      +/-   ##
===========================================
+ Coverage    92.03%   92.10%   +0.07%     
===========================================
  Files          525      528       +3     
  Lines        24145    24381     +236     
===========================================
+ Hits         22220    22455     +235     
- Misses        1925     1926       +1

see 59 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

causten · 2025-05-15T18:29:26Z

what is the next step to move this out of Draft ?

spolifroni-amd · 2025-05-15T20:16:37Z

what is the next step to move this out of Draft ?

It's missing a lot of information and context. I've asked for a call with the team to go over the env vars and so I can fill in the content. Once that's done, there's just a matter of writing it up and then it'll be out of draft.

migraphx-bot · 2025-05-22T03:43:52Z

Test	Batch	Rate new f6dc63	Rate old 826c10	Diff	Compare
torchvision-resnet50	64	3,256.53	3,256.72	-0.01%	✅
torchvision-resnet50_fp16	64	6,934.77	6,911.83	0.33%	✅
torchvision-densenet121	32	2,454.19	2,452.43	0.07%	✅
torchvision-densenet121_fp16	32	4,231.76	4,202.13	0.71%	✅
torchvision-inceptionv3	32	1,627.41	1,627.37	0.00%	✅
torchvision-inceptionv3_fp16	32	2,717.88	2,717.14	0.03%	✅
cadene-inceptionv4	16	760.52	760.75	-0.03%	✅
cadene-resnext64x4	16	819.54	818.98	0.07%	✅
slim-mobilenet	64	7,471.63	7,476.45	-0.06%	✅
slim-nasnetalarge	64	217.85	209.73	3.87%	🔆
slim-resnet50v2	64	3,457.12	3,347.89	3.26%	🔆
bert-mrpc-onnx	8	1,145.64	1,152.22	-0.57%	✅
bert-mrpc-tf	1	455.22	459.66	-0.97%	✅
pytorch-examples-wlang-gru	1	482.79	342.96	40.77%	🔆
pytorch-examples-wlang-lstm	1	442.34	473.79	-6.64%	🔴
torchvision-resnet50_1	1	812.67	804.52	1.01%	✅
cadene-dpn92_1	1	424.57	413.05	2.79%	✅
cadene-resnext101_1	1	393.08	389.32	0.97%	✅
onnx-taau-downsample	1	396.35	397.46	-0.28%	✅
dlrm-criteoterabyte	1	32.32	32.33	-0.01%	✅
dlrm-criteoterabyte_fp16	1	51.34	51.31	0.05%	✅
agentmodel	1	9,454.12	10,295.20	-8.17%	🔴
unet_fp16	2	58.62	59.48	-1.45%	✅
resnet50v1_fp16	1	1,072.43	1,041.41	2.98%	✅
resnet50v1_int8	1	1,065.74	1,058.98	0.64%	✅
bert_base_cased_fp16	64	1,170.79	1,175.80	-0.43%	✅
bert_large_uncased_fp16	32	356.24	358.28	-0.57%	✅
bert_large_fp16	1	193.73	199.89	-3.08%	🔴
distilgpt2_fp16	16	2,232.77	2,243.36	-0.47%	✅
yolov5s	1	542.45	534.75	1.44%	✅
tinyllama	1	43.82	43.89	-0.15%	✅
vicuna-fastchat	1	44.18	45.00	-1.83%	✅
whisper-tiny-encoder	1	421.87	419.30	0.61%	✅
whisper-tiny-decoder	1	411.51	403.28	2.04%	✅
llama2_7b	1	nan	nan	nan%	❌
qwen1.5-7b	1	23.51	23.55	-0.17%	✅
phi3-3.8b	1	26.59	26.60	-0.04%	✅
mask-rcnn	1	2.03	12.80	-84.13%	🔴
llama3-8b	1	21.75	21.75	-0.01%	✅
whisper-large-encoder	1	10.21	10.22	-0.06%	✅
whisper-large-decoder	1	98.06	102.86	-4.66%	🔴
mistral-7b	1	23.76	23.75	0.05%	✅
FLUX.1-schnell	1	894.23	767.44	16.52%	🔆
nan	nan	nan	nan	nan%	❌

This build is not recommended to merge 🔴

migraphx-bot · 2025-05-22T03:43:54Z

✅ bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

❌bert-mrpc-tf: ERROR - check error output

2025-05-21 21:53:18.573148: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1747882403.948325 160091 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 62973 MB memory: -> device: 0, name: AMD Instinct MI250X/MI250, pci bus id: 0000:b3:00.0
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1747882404.791000 160091 mlir_graph_optimization_pass.cc:401] MLIR V1 optimization pass is not enabled
2025-05-21 21:53:33.360258: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-05-21 21:53:33.360310: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-05-21 21:53:33.360363: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-05-21 21:53:33.360413: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-05-21 21:53:33.360443: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-05-21 21:53:33.360495: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-05-21 21:53:33.360552: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-05-21 21:53:33.360603: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
2025-05-21 21:53:33.361633: E tensorflow/compiler/mlir/tools/kernel_gen/tf_framework_c_interface.cc:228] INTERNAL: Generating device code failed.
2025-05-21 21:53:33.362737: W tensorflow/core/framework/op_kernel.cc:1829] UNKNOWN: JIT compilation failed.
2025-05-21 21:53:33.362757: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous is aborting with status: UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
2025-05-21 21:53:33.362768: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous is aborting with status: UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
[[import/loss/output/_21]]
2025-05-21 21:53:33.362785: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 11217777527359497193
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1407, in _do_call
return fn(*args)
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1390, in _run_fn
return self._call_tf_sessionrun(options, feed_dict, fetch_list,
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1483, in _call_tf_sessionrun
return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
(0) UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
[[import/loss/output/_21]]
(1) UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 340, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 324, in main
y_out = sess.run(y, feed_dict=tf_dict)
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 977, in run
result = self._run(None, fetches, feed_dict, options_ptr,
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1220, in _run
results = self._do_run(handle, final_targets, final_fetches,
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1400, in _do_run
return self._do_call(_run_fn, feeds, fetches, targets, options,
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1426, in _do_call
raise type(e)(node_def, op, message) # pylint: disable=no-value-for-parameter
tensorflow.python.framework.errors_impl.UnknownError: Graph execution error:

Detected at node 'import/bert/embeddings/LayerNorm/moments/SquaredDifference' defined at (most recent call last):
Node: 'import/bert/embeddings/LayerNorm/moments/SquaredDifference'
Detected at node 'import/bert/embeddings/LayerNorm/moments/SquaredDifference' defined at (most recent call last):
Node: 'import/bert/embeddings/LayerNorm/moments/SquaredDifference'
2 root error(s) found.
(0) UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
[[import/loss/output/_21]]
(1) UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'import/bert/embeddings/LayerNorm/moments/SquaredDifference':

✅ pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

✅ pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

✅ dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

✅ agentmodel: PASSED: MIGraphX meets tolerance

✅ unet: PASSED: MIGraphX meets tolerance

✅ resnet50v1: PASSED: MIGraphX meets tolerance

✅ bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

✅ bert_large: PASSED: MIGraphX meets tolerance

✅ yolov5s: PASSED: MIGraphX meets tolerance

✅ tinyllama: PASSED: MIGraphX meets tolerance

✅ vicuna-fastchat: PASSED: MIGraphX meets tolerance

✅ whisper-tiny-encoder: PASSED: MIGraphX meets tolerance

✅ whisper-tiny-decoder: PASSED: MIGraphX meets tolerance

✅ distilgpt2_fp16: PASSED: MIGraphX meets tolerance

❌llama2_7b: ERROR - check error output

Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 340, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 205, in main
model = migraphx.parse_onnx(model_name, default_dim_value=batch)
RuntimeError: /src/AMDMIGraphX/src/onnx/onnx_parser.cpp:264: parse_from: PARSE_FROM: Failed reading onnx file: /new-saved-models/llama2_7b/decoder_model.onnx

❌qwen1.5-7b: ERROR - check error output

usage: accuracy_checker.py [-h] [--onnx ONNX] [--tf TF] [--provider PROVIDER]
[--batch BATCH] [--fill1] [--fill0] [--fp16]
[--argmax] [--verbose] [--tolerance TOLERANCE]
[--input-dim INPUT_DIM] [--target TARGET]
[--ort-run] [--ort-logging]
[--disable-offload-copy] [--disable-fast-math]
[--exhaustive_tune]
accuracy_checker.py: error: unrecognized arguments: input_ids attention_mask position_ids 1 256 @attention_mask 1 256 @position_ids 1 256

❌phi3-3.8b: ERROR - check error output

usage: accuracy_checker.py [-h] [--onnx ONNX] [--tf TF] [--provider PROVIDER]
[--batch BATCH] [--fill1] [--fill0] [--fp16]
[--argmax] [--verbose] [--tolerance TOLERANCE]
[--input-dim INPUT_DIM] [--target TARGET]
[--ort-run] [--ort-logging]
[--disable-offload-copy] [--disable-fast-math]
[--exhaustive_tune]
accuracy_checker.py: error: unrecognized arguments: input_ids attention_mask position_ids 1 256 @attention_mask 1 256 @position_ids 1 256

❌mask-rcnn: ERROR - check error output

usage: accuracy_checker.py [-h] [--onnx ONNX] [--tf TF] [--provider PROVIDER]
[--batch BATCH] [--fill1] [--fill0] [--fp16]
[--argmax] [--verbose] [--tolerance TOLERANCE]
[--input-dim INPUT_DIM] [--target TARGET]
[--ort-run] [--ort-logging]
[--disable-offload-copy] [--disable-fast-math]
[--exhaustive_tune]
accuracy_checker.py: error: unrecognized arguments: 3 800 800

✅ llama3-8b: PASSED: MIGraphX meets tolerance

❌#whisper-large-encoder: ERROR - check error output

Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 340, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 205, in main
model = migraphx.parse_onnx(model_name, default_dim_value=batch)
RuntimeError: /src/AMDMIGraphX/src/include/migraphx/op/convolution.hpp:100: normalize_compute_shape: CONVOLUTION: mismatched channel numbers

✅ whisper-large-decoder: PASSED: MIGraphX meets tolerance

✅ mistral-7b: PASSED: MIGraphX meets tolerance

✅ FLUX.1-schnell: PASSED: MIGraphX meets tolerance

creating a new env var section

78875af

spolifroni-amd added the documentation label May 6, 2025

aarushjain29 self-requested a review May 7, 2025 14:31

some format changes

f6dc635

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

creating a new env var section #3994

creating a new env var section #3994

spolifroni-amd commented May 6, 2025

Uh oh!

codecov bot commented May 6, 2025 •

edited

Loading

Uh oh!

causten commented May 15, 2025

Uh oh!

spolifroni-amd commented May 15, 2025

Uh oh!

migraphx-bot commented May 22, 2025

Uh oh!

migraphx-bot commented May 22, 2025

Uh oh!

Uh oh!

creating a new env var section #3994

Are you sure you want to change the base?

creating a new env var section #3994

Conversation

spolifroni-amd commented May 6, 2025

Uh oh!

codecov bot commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

causten commented May 15, 2025

Uh oh!

spolifroni-amd commented May 15, 2025

Uh oh!

migraphx-bot commented May 22, 2025

Uh oh!

migraphx-bot commented May 22, 2025

Uh oh!

Uh oh!

codecov bot commented May 6, 2025 •

edited

Loading