Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI][E2E] Split gen12 pre-commit testing into build and run stages #16321

Merged
merged 32 commits into from
Dec 20, 2024

Conversation

ayylol
Copy link
Contributor

@ayylol ayylol commented Dec 10, 2024

This patch enables E2E build and run splitting on pre-commit CI. Currently one of the main bottlenecks in pre-commit CI is the Linux Gen12 testing which takes between 30 ~ 40 minutes to complete. By instead building these tests on our fast build systems, and then transferring the binaries to the Gen12 machine we can cut down on this time by roughly half.

Sadly, the Windows Gen12 testing will still be the main bottleneck so total time for pre-commit won't improve with this PR, but we will at least get faster feedback for gen12 on Linux.

@ayylol ayylol changed the title [do not merge] Testing pre-commit ci splitting [CI][E2E] Split gen12 pre-commit testing into build and run stages Dec 11, 2024
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes here are to work-around a hang that occurs when one of our tests try to run a binary that does not exist when inside one of the containers that we use for CI.

This hang is not a recent regression but rather an issue that has existed for a while but is very infrequently encountered with the current full testing mode. With the split testing if a test fails to build a binary in the build-only stage it will fail the run-only stage due to the binary not existing, making this hang a lot more common.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we at least make a GH issue for the hang with repro steps?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, here it is: #16351

.github/workflows/sycl-linux-precommit.yml Outdated Show resolved Hide resolved
.github/workflows/sycl-linux-precommit.yml Outdated Show resolved Hide resolved
.github/workflows/sycl-linux-precommit.yml Show resolved Hide resolved
.github/workflows/sycl-linux-precommit.yml Outdated Show resolved Hide resolved
.github/workflows/sycl-linux-run-tests.yml Outdated Show resolved Hide resolved
.github/workflows/sycl-linux-run-tests.yml Outdated Show resolved Hide resolved
.github/workflows/sycl-linux-run-tests.yml Outdated Show resolved Hide resolved
.github/workflows/sycl-linux-run-tests.yml Outdated Show resolved Hide resolved
sycl/test-e2e/format.py Show resolved Hide resolved
@ayylol
Copy link
Contributor Author

ayylol commented Dec 20, 2024

@sarnex FYI, I had to change the container for the build-only stage from the ubuntu 22 alldeps container to the ubuntu 24 latest intel drivers container because some tests started to fail on it. I was using that container originally because it has the dependencies from the build container so that we can build the cuda/hip tests as well, but since those arent enabled yet for this pr its not really needed yet.

@sarnex
Copy link
Contributor

sarnex commented Dec 20, 2024

Ok yeah looks like the problem is some tests link against libze_loader which is part of the GPU driver, and they recently dropped support for 22.04 (or at least don't provide packages for it) so I switched us to using 24.04, so libze_loader is using a newer glibc so it won't work on 22.04. Eventually we should make a 24.04 alldeps image but for now this is fine.

@ayylol
Copy link
Contributor Author

ayylol commented Dec 20, 2024

@intel/llvm-gatekeepers This is ready to merge.
Failures in post commit gen12 for tests OnlineCompiler/online_compiler_OpenCL.cpp and USM/fill_any_size.cpp, are unrelated to this pr and appearing currently in post-commit on sycl branch (https://github.com/intel/llvm/actions/runs/12437990459/job/34729407798)

@sarnex sarnex merged commit cc4dee4 into sycl Dec 20, 2024
23 of 24 checks passed
@ayylol ayylol deleted the e2e-split-ci-testing branch December 20, 2024 21:28
@sarnex
Copy link
Contributor

sarnex commented Dec 20, 2024

@ayylol Can we/are we planning to do the same for postcommit?

@sarnex
Copy link
Contributor

sarnex commented Dec 20, 2024

@ayylol
Also, seeing an XPASS during building the E2E tests, I don't know what an XPASS even means there but yeah

Unexpectedly Passed Tests (1):
  SYCL :: Matrix/SG32/get_coordinate_ops.cpp

https://github.com/intel/llvm/actions/runs/12438967580/job/34732118174

@ayylol
Copy link
Contributor Author

ayylol commented Dec 23, 2024

@ayylol Also, seeing an XPASS during building the E2E tests, I don't know what an XPASS even means there but yeah

Unexpectedly Passed Tests (1):
  SYCL :: Matrix/SG32/get_coordinate_ops.cpp

https://github.com/intel/llvm/actions/runs/12438967580/job/34732118174

XPASS on the build-only stage usually means that its a test that is able to compile, but fails when running but the way it is marked the XFAIL is incorrectly triggered on the build-only. Opened #16455 to fix this.

@ayylol Can we/are we planning to do the same for postcommit?

Yep, adding this to the post-commit should be quite similar, likely wont require all the extra test markup I had to do in this pr. I'm thinking we should hold on it and see if the pre-commit changes aren't causing too much hassle before proceeding with that.

@sarnex
Copy link
Contributor

sarnex commented Dec 23, 2024

Yep, adding this to the post-commit should be quite similar, likely wont require all the extra test markup I had to do in this pr. I'm thinking we should hold on it and see if the pre-commit changes aren't causing too much hassle before proceeding with that.

Sure, sounds good, thx!

@aelovikov-intel
Copy link
Contributor

IMO, post-commit should happen after

  1. We moved all pre-commit (except PVC e2e maybe)
  2. We got back and cleaned up the logic for unsupported/xfail/etc so that it works nicely with the feature and doesn't require too much explicit markup.

To clarify: "REQURIES: pvc, build-and-run-mode" is bad, "REQUIRES-RUN: pvc" or "XFAIL-RUN: pvc" is much better. I'm not pushing for these specific spellings, but the whole "build-and-run-mode" was a temporary solution to make fast progress and rip the immediate benefits in the pre-commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants