-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GitHub Actions error collection script only reads latest attempt #437
Comments
@robrap @timmc-edx, I would like to work on this task and I am thinking of using pyGithub library for the integration. Please let me know if I can work on this task. |
@RafayGhafoor: That sounds good and we're here to answer questions. Good luck. |
@RafayGhafoor are you still working on this? |
@rgraber, I had been working on solving the task and went in to send a PR to enable github cli to rerun failed jobs based on annotated messages but the related issue created for PR didn't get any traction. Normally, what I had in mind was to integrate github cli (gh) with the workflow which automatically reruns the job if the status for failed job has annotated message of "Lost connection....". Since, the issue didn't get any follow up, I have lost motivation to work on it but I think a custom script which has the rights to rerun the failed jobs could be a possible solution which only operates on jobs failed due to losing communication to the server. Following are the steps that I had thought of adding as a last step to the CI:
|
I made an attempt at fixing this in #544, which also includes some other improvements. But... it turns out all of the attempt objects for a workflow run are pointing to the same check suite! (The most recent one, naturally -- which means we lose any errors that provoke someone to re-run their tests.) This is blocked unless we can find a solution. I've posted about the issue at https://github.com/orgs/community/discussions/103026. |
This is now a Product Feedback submission, since it appears that this is just a bug in the API: https://github.com/orgs/community/discussions/124000 |
A/C:
The Actions error collection script only collects the status of the most recent attempt on each job. Since we re-run most of our failed jobs, this script can't see most of the error information we're interested in.
See openedx/edx-platform#32671 for an example of where this information would have been useful.
While we're in there, it might also be useful to turn this into a multi-stage script with caching. Currently there's a risk of getting rate-limited partway through a run, at which point all of the in-memory collected information is lost. It might be better to split the script so that it first gets all of the commits in the desired time range, writes that to file, and then gets job and attempt information -- but only for jobs that it hasn't already cached on disk. This would speed up future runs.
The text was updated successfully, but these errors were encountered: