Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gitlab pipelines might mark as "broken spec" build that failed because of system failures #324

Open
alalazo opened this issue Sep 27, 2022 · 3 comments

Comments

@alalazo
Copy link
Member

alalazo commented Sep 27, 2022

Inspecting the list of pipelines run on develop there is a system failure when building mpich occurring here:

The pipeline running after the one above fails, reporting the two mpich as broken specs:

spack/spack#32839 swaps two lines of code to trigger a rebuild of mpich, which results in the following jobs for the aws-isc pipeline:

where mpich builds fine. That seems to suggest that a system failure was recorded as a broken spec.

@scottwittenburg
Copy link
Collaborator

Just pointing out that if the rebuild job can never get scheduled onto a pod, as in the jobs you linked above, then we definitely don't report anything to the broken specs list, as none of our rebuild script logic ever gets to run. In the case of the broken mpich/oxqu5m7, it appears to have been recorded on the broken specs list here: https://gitlab.spack.io/spack/spack/-/jobs/3333864. Then the other retries were already used up, or had previously failed and this was the third one.

@alalazo
Copy link
Member Author

alalazo commented Sep 27, 2022

Feel free to change the title of the spec to make it more precise. What I did to track this down was:

  1. Get a list of the pipelines on develop
  2. Look one by one at the failed pipelines to see when the failure was happening
  3. Check "failed jobs" as in the screenshot

Screenshot from 2022-09-27 19-19-54

Do I understand correctly then that "Failed jobs" might not be the job that caused the spec to be marked as "broken", but I may need to look further down in the right column, like:

column_right

@scottwittenburg
Copy link
Collaborator

Yes, by looking down that list at the other failures of the same hash, I found the one that actually got scheduled, ran, and the build failed (and said it was going to report the breakage).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants