Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[infra] Run parameterized ONNX model tests across CPU, Vulkan, and HIP. #19524

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from
19 changes: 15 additions & 4 deletions .github/workflows/pkgci_test_onnx.yml
Original file line number Diff line number Diff line change
Expand Up @@ -130,11 +130,22 @@ jobs:
include:
# CPU
- name: cpu_llvm_task
config-file: onnx_models_cpu_llvm_task.json
runs-on: ubuntu-24.04

# TODO(scotttodd): test other backends (parameterize the test suite)
# AMD GPU
- name: amdgpu_rocm_rdna3
config-file: onnx_models_gpu_rocm_rdna3.json
runs-on: nodai-amdgpu-w7900-x86-64
- name: amdgpu_vulkan
config-file: onnx_models_gpu_vulkan.json
runs-on: nodai-amdgpu-w7900-x86-64

# NVIDIA GPU
# TODO(#18238): migrate to new runner cluster
env:
VENV_DIR: ${{ github.workspace }}/venv
CONFIG_FILE_PATH: tests/external/iree-test-suites/onnx_models/${{ matrix.config-file }}
steps:
- name: Checking out IREE repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
Expand All @@ -158,7 +169,7 @@ jobs:
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
repository: iree-org/iree-test-suites
ref: 8e6af9e75d874ef8c9f8ff55f12cb38157dd55eb
ref: d7db851fe0c332b80f9380861cd60ac7c642b584
path: iree-test-suites
- name: Install ONNX models test suite requirements
run: |
Expand All @@ -170,6 +181,6 @@ jobs:
pytest iree-test-suites/onnx_models/ \
-rA \
--log-cli-level=info \
--override-ini=xfail_strict=false \
--timeout=120 \
--durations=0
--durations=0 \
--test-config-file=${CONFIG_FILE_PATH}
9 changes: 9 additions & 0 deletions docs/website/docs/assets/stylesheets/iree.css
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,10 @@
top: 5px;
left: 0;
display: inline;

/* Override other style settings based on header level */
font-size: .75rem;
text-transform: none;
}

pre.highlight code {
Expand Down Expand Up @@ -72,3 +76,8 @@ pre.highlight code {
-webkit-mask-image: var(--md-admonition-icon--danger);
mask-image: var(--md-admonition-icon--danger);
}

/* Don't convert h5 text to uppercase */
.md-typeset h5 {
text-transform: none;
}
69 changes: 57 additions & 12 deletions docs/website/docs/developers/general/testing-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -396,6 +396,8 @@ not supported by Bazel rules at this point.

## External test suites
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This page is published at https://iree.dev/developers/general/testing-guide/#external-test-suites. Generally trying to put enough information there so

  • developers working in just this iree-org/iree repository can understand what the different tests are and how to handle newly failing or passing tests
  • developers are aware of out of tree test suites
  • each test suite is put in context

Along these lines, I would like to promote more of the test suite work going on (in both iree-test-suites and SHARK-TestSuite) up to the level of overall IREE ecosystem dashboards and release notes. For example, each stable release could highlight the test result delta and average performance delta since the previous release.


### iree-test-suites

Multiple test suites are under development in the
[iree-org/iree-test-suites](https://github.com/iree-org/iree-test-suites)
repository.
Expand All @@ -407,12 +409,7 @@ repository.
* Keeping tests out of tree forces them to use public project APIs and allows
the core project to keep its infrastructure simpler.

The [nod-ai/SHARK-TestSuite](https://github.com/nod-ai/SHARK-TestSuite)
repository also contains tests for many machine learning models. Some of these
tests are planned to be migrated into
[iree-org/iree-test-suites](https://github.com/iree-org/iree-test-suites).

### linalg operator tests
#### linalg operator tests

Tests for operators in the MLIR linalg dialect like `matmul`, and `convolution`
are being migrated from folders like
Expand All @@ -424,7 +421,7 @@ in the
[iree-org/iree-test-suites](https://github.com/iree-org/iree-test-suites)
repository.

### ONNX operator tests
#### :simple-onnx: ONNX operator tests

Tests for individual ONNX operators are included at
[`onnx_ops/`](https://github.com/iree-org/iree-test-suites/tree/main/onnx_ops)
Expand All @@ -438,7 +435,7 @@ Testing ONNX programs follows several stages:

```mermaid
graph LR
Import -. "<br>(offline)" .-> Compile
Import -. "(offline)" .-> Compile
Compile --> Run
```

Expand Down Expand Up @@ -469,15 +466,15 @@ To run slices of the test suite, a [pytest](https://docs.pytest.org/en/stable/)
runner is included that can be configured using JSON files. The JSON files
tested in the IREE repo itself are stored in
[`tests/external/iree-test-suites/onnx_ops/`](https://github.com/iree-org/iree/tree/main/tests/external/iree-test-suites/onnx_ops).

For example, here is part of a config file for running ONNX tests on CPU:
For example, here is part of a config file for running ONNX operator tests on
CPU:

<!-- markdownlint-disable-next-line -->
```json title="tests/external/iree-test-suites/onnx_ops/onnx_ops_cpu_llvm_sync.json" linenums="1"
--8<-- "tests/external/iree-test-suites/onnx_ops/onnx_ops_cpu_llvm_sync.json::20"
```

#### Updating config files
##### Updating config files

If the ONNX operator tests fail on a GitHub Actions workflow, check the logs
for the nature of the failure. Often, a test is *newly passing*, with logs
Expand All @@ -496,11 +493,59 @@ committed:

![image](https://github.com/user-attachments/assets/b5dbdcb4-4c0a-4ff2-adc6-9021614179b2)

### ONNX model tests
#### :simple-onnx: ONNX model tests

Tests for ONNX models are included at
[`onnx_models/`](https://github.com/iree-org/iree-test-suites/tree/main/onnx_models)
in the
[iree-org/iree-test-suites](https://github.com/iree-org/iree-test-suites)
repository. These tests use models from the upstream
[onnx/models](https://github.com/onnx/models) repository.

Like the ONNX operator tests, the ONNX model tests use configuration files to
control which flags are used and which tests are run. The config files tested
in the IREE repo itself are stored in
[`tests/external/iree-test-suites/onnx_models/`](https://github.com/iree-org/iree/tree/main/tests/external/iree-test-suites/onnx_models).
For example, here is part of a config file for running ONNX model tests on CPU:

<!-- markdownlint-disable-next-line -->
```json title="tests/external/iree-test-suites/onnx_models/onnx_models_cpu_llvm_task.json" linenums="1"
--8<-- "tests/external/iree-test-suites/onnx_models/onnx_models_cpu_llvm_task.json::14"
```

Unlike the ONNX operator tests, we do not run the full set of tests on every
commit to [iree-org/iree](https://github.com/iree-org/iree). Instead, we run a
curated list of small tests that are expected to pass in
[iree-org/iree](https://github.com/iree-org/iree) and then run the full set of
tests nightly in
[iree-org/iree-test-suites](https://github.com/iree-org/iree-test-suites).

#### sharktank tests

Tests for small scale versions of Large Language Models (LLMs)
and other Generative AI (GenAI) programs exported using the
[sharktank package](https://github.com/nod-ai/shark-ai/tree/main/sharktank)
built as part of the [shark-ai project](https://github.com/nod-ai/shark-ai) are
included at
[`sharktank_models/`](https://github.com/iree-org/iree-test-suites/tree/main/sharktank_models)
in the
[iree-org/iree-test-suites](https://github.com/iree-org/iree-test-suites)
repository.

<!-- TODO(scotttodd): document how to coordinate changes across these projects -->

### SHARK-TestSuite

The [nod-ai/SHARK-TestSuite](https://github.com/nod-ai/SHARK-TestSuite)
repository also contains tests using IREE,
[llvm/torch-mlir](https://github.com/llvm/torch-mlir), and
[nod-ai/shark-ai](https://github.com/nod-ai/shark-ai).

Some test coverage may overlap between SHARK-TestSuite and iree-test-suites,
though some tests are planned to be migrated into
[iree-org/iree-test-suites](https://github.com/iree-org/iree-test-suites) once
they mature and have demonstrated general utility to the upstream developer
community.

Test reports for nightly runs in SHARK-TestSuite are uploaded to
[nod-ai/e2eshark-reports](https://github.com/nod-ai/e2eshark-reports).
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
{
"config_name": "cpu_llvm_task",
"iree_compile_flags": [
"--iree-hal-target-backends=llvm-cpu",
"--iree-llvmcpu-target-cpu=host"
],
"iree_run_module_flags": [
"--device=local-task"
],
"tests_and_expected_outcomes": {
"default": "skip",
"tests/model_zoo/validated/vision/body_analysis_models_test.py::test_models[age_gender/models/age_googlenet.onnx]": "pass",
"tests/model_zoo/validated/vision/body_analysis_models_test.py::test_models[age_gender/models/gender_googlenet.onnx]": "pass",
"tests/model_zoo/validated/vision/classification_models_test.py::test_models[alexnet/model/bvlcalexnet-12.onnx]": "pass",
"tests/model_zoo/validated/vision/classification_models_test.py::test_models[caffenet/model/caffenet-12.onnx]": "pass",
"tests/model_zoo/validated/vision/classification_models_test.py::test_models[densenet-121/model/densenet-12.onnx]": "pass",
"tests/model_zoo/validated/vision/classification_models_test.py::test_models[efficientnet-lite4/model/efficientnet-lite4-11.onnx]": "pass",
"tests/model_zoo/validated/vision/classification_models_test.py::test_models[inception_and_googlenet/inception_v2/model/inception-v2-9.onnx]": "pass",
"tests/model_zoo/validated/vision/classification_models_test.py::test_models[mnist/model/mnist-12.onnx]": "pass",
"tests/model_zoo/validated/vision/classification_models_test.py::test_models[mobilenet/model/mobilenetv2-12.onnx]": "pass",
"tests/model_zoo/validated/vision/classification_models_test.py::test_models[rcnn_ilsvrc13/model/rcnn-ilsvrc13-9.onnx]": "pass",
"tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v1-12.onnx]": "pass",
"tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v2-7.onnx]": "pass",
"tests/model_zoo/validated/vision/classification_models_test.py::test_models[shufflenet/model/shufflenet-9.onnx]": "pass",
"tests/model_zoo/validated/vision/classification_models_test.py::test_models[shufflenet/model/shufflenet-v2-12.onnx]": "pass",
"tests/model_zoo/validated/vision/classification_models_test.py::test_models[squeezenet/model/squeezenet1.0-9.onnx]": "pass",
"tests/model_zoo/validated/vision/classification_models_test.py::test_models[vgg/model/vgg19-7.onnx]": "pass",
"tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[tiny-yolov2/model/tinyyolov2-8.onnx]": "pass",
"tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[yolov2-coco/model/yolov2-coco-9.onnx]": "pass",
"tests/model_zoo/validated/vision/super_resolution_models_test.py::test_models[sub_pixel_cnn_2016/model/super-resolution-10.onnx]": "pass"
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
{
"config_name": "gpu_rocm_rdna3",
"iree_compile_flags": [
"--iree-hal-target-backends=rocm",
"--iree-hip-target=gfx1100"
],
"iree_run_module_flags": [
"--device=hip"
],
"tests_and_expected_outcomes": {
"default": "skip",
"tests/model_zoo/validated/vision/body_analysis_models_test.py::test_models[age_gender/models/age_googlenet.onnx]": "pass",
"tests/model_zoo/validated/vision/classification_models_test.py::test_models[alexnet/model/bvlcalexnet-12.onnx]": "pass",
"tests/model_zoo/validated/vision/classification_models_test.py::test_models[caffenet/model/caffenet-12.onnx]": "pass",
"tests/model_zoo/validated/vision/classification_models_test.py::test_models[densenet-121/model/densenet-12.onnx]": "pass",
"tests/model_zoo/validated/vision/classification_models_test.py::test_models[efficientnet-lite4/model/efficientnet-lite4-11.onnx]": "pass",
"tests/model_zoo/validated/vision/classification_models_test.py::test_models[inception_and_googlenet/googlenet/model/googlenet-12.onnx]": "pass",
"tests/model_zoo/validated/vision/classification_models_test.py::test_models[inception_and_googlenet/inception_v2/model/inception-v2-9.onnx]": "pass",
"tests/model_zoo/validated/vision/classification_models_test.py::test_models[mnist/model/mnist-12.onnx]": "pass",
"tests/model_zoo/validated/vision/classification_models_test.py::test_models[mobilenet/model/mobilenetv2-12.onnx]": "pass",
"tests/model_zoo/validated/vision/classification_models_test.py::test_models[rcnn_ilsvrc13/model/rcnn-ilsvrc13-9.onnx]": "pass",
"tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v1-12.onnx]": "pass",
"tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v2-7.onnx]": "pass",
"tests/model_zoo/validated/vision/classification_models_test.py::test_models[shufflenet/model/shufflenet-9.onnx]": "pass",
"tests/model_zoo/validated/vision/classification_models_test.py::test_models[shufflenet/model/shufflenet-v2-12.onnx]": "pass",
"tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[tiny-yolov2/model/tinyyolov2-8.onnx]": "pass",
"tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[yolov2-coco/model/yolov2-coco-9.onnx]": "pass",
"tests/model_zoo/validated/vision/super_resolution_models_test.py::test_models[sub_pixel_cnn_2016/model/super-resolution-10.onnx]": "pass"
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
{
"config_name": "gpu_vulkan",
"iree_compile_flags": [
"--iree-hal-target-backends=vulkan-spirv"
],
"iree_run_module_flags": [
"--device=vulkan"
],
"tests_and_expected_outcomes": {
"default": "skip",
"tests/model_zoo/validated/vision/classification_models_test.py::test_models[alexnet/model/bvlcalexnet-12.onnx]": "pass",
"tests/model_zoo/validated/vision/classification_models_test.py::test_models[caffenet/model/caffenet-12.onnx]": "pass",
"tests/model_zoo/validated/vision/classification_models_test.py::test_models[mnist/model/mnist-12.onnx]": "pass",
"tests/model_zoo/validated/vision/classification_models_test.py::test_models[mobilenet/model/mobilenetv2-12.onnx]": "pass",
"tests/model_zoo/validated/vision/classification_models_test.py::test_models[rcnn_ilsvrc13/model/rcnn-ilsvrc13-9.onnx]": "pass",
"tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v1-12.onnx]": "pass",
"tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v2-7.onnx]": "pass",
"tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[tiny-yolov2/model/tinyyolov2-8.onnx]": "pass",
"tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[yolov2-coco/model/yolov2-coco-9.onnx]": "pass"
}
}
Loading