Monitor frequency of GitHub action errors #32550

rgraber · 2023-06-22T18:20:30Z

A/C

We are able to find out how often a given action failure occurs via something automated
If the lost communication error happens regularly, we have a new issue created to address it

Implementation notes:
The "something automated" might be a script someone can run, or a dashboard, or something else entirely. Left for whoever picks this up.
If this is really hard, timebox to a day or two

Notes from the original creation of this ticket:

Github actions are occasionally failing with:

The self-hosted runner: edx-platform-openedx-ci-runner-deployment-7xdl7-zf8dj lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.

This will cause final_checks_before_prod to fail.

Occurrences:

2023-06-22
Possibly on 2023-06-08?
2023-07-06 - created edx-platform unit tests error: Self-hosted runner loses communication with the server #32671 to track

One possible use of this ticket would just be to determine if this is indeed happening more often than it used to.

Possible avenues of exploration:
Are github workers resource-starved?
May be able to use GitHub API to get status of jobs, then command line to get logs (if this shows up in the logs)

The text was updated successfully, but these errors were encountered:

jmbowman · 2023-06-26T14:35:40Z

I started discovery on some dashboard tools for GitHub Actions (and other CI services) that might have some out of the box functionality for classifying errors that could help here: openedx/public-engineering#168 .

rgraber added this to Arch-BOM Jun 22, 2023

rgraber moved this to Prioritized in Arch-BOM Jun 22, 2023

rgraber removed the status in Arch-BOM Jun 22, 2023

rgraber moved this to Prioritized in Arch-BOM Jun 23, 2023

rgraber changed the title ~~Action runners failing with lost communication errors~~ Monitor frequency of GitHub action errors Jun 26, 2023

rgraber moved this from Prioritized to Groomed in Arch-BOM Jun 26, 2023

jmbowman moved this from Groomed to On-Call in Arch-BOM Jul 20, 2023

rgraber moved this from On-Call to In Progress in Arch-BOM Jul 26, 2023

rgraber self-assigned this Jul 26, 2023

rgraber closed this as completed Aug 3, 2023

github-project-automation bot moved this from In Progress to Done in Arch-BOM Aug 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Monitor frequency of GitHub action errors #32550

Monitor frequency of GitHub action errors #32550

rgraber commented Jun 22, 2023 •

edited by timmc-edx

Loading

jmbowman commented Jun 26, 2023

Monitor frequency of GitHub action errors #32550

Monitor frequency of GitHub action errors #32550

Comments

rgraber commented Jun 22, 2023 • edited by timmc-edx Loading

jmbowman commented Jun 26, 2023

rgraber commented Jun 22, 2023 •

edited by timmc-edx

Loading