-
Notifications
You must be signed in to change notification settings - Fork 347
fix: Increase GHA workflow timeout to 35 days from 72h #1067
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm; just pushed a minor change.
wdyt @iterative/cml? Technically this PR is a breaking change for people with hardcoded timeout-minutes: 4320
:/
btw @casperdcl updated the variable name based on a brief discussion in #1054 (comment) as it's probably a little bit of a misnomer - if you think it's clearer, happy for it to be the name in 7ba6e59 |
perhaps an update/note in the readme as well as a docs ones to go with this. |
@danieljimeneznz for transparency we have been discussing this internally as it has the potential to break several user workflows. We are still considering the best approach but I believe that we are currently thinking of having this as a configurable option or parsing the user's workflow to check for the time value when the cml runner command is invoked. |
Yeah I think that's probably best, when I was looking through the Github docs and saw the Not sure I agree that it would break people's workflows though - was going to test that theory by setting a workflow to timeout after 5 minutes via the workflow file to see whether it would shutdown after being caught idle for 5 mins. |
This timeout has nothing to do with being idle. But long runnering workflows took longer than GitHub Actions originally allowed. Some long-running jobs have been built to save intermediate data so they can be resumed by a restart. If we change the restart mechanism to 35-days but they still have the 3-day value hard coded, actions will stop the job, and the restart mechanism won't go off. |
@danieljimeneznz we want to action this so that your contribution isn't hanging in limbo for too long, do you have any thoughts on my previous comment? |
Apologies I was away on holiday and have only just been catching up on things.
I might have a bit of a simplistic view on this so wanted to run a test to confirm my theory but when GitHub reaches the timeout specified in the workflow file does it not send the runner a command to stop the currently running job? At which point the idle timer would kick in and the runner would decommission within 5 minutes. I'm not following along with the logic of the 3-day hard coded intermediate value for the restart? Unless you're meaning that the current runner ignores Github action's request to stop the job (as a workaround to support jobs running longer than Github allows)? (which seems a little dangerous). Feel free to close the PR if it's left dangling for too long - happy to help with the implementation of parsing the workflow timeout value. |
@danieljimeneznz we are going to merge but please can you warn the users that the time limit has been updated? winston.warn( 'Github Actions timeout has been updated from 72h to 35 days. Update your workflow accordingly to be able to restart it automatically.'); |
@DavidGOrtega have added the log line. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀 ❤️
Noticed that there aren't really any tests for this in the runner - can potentially write one, or run cml-runner off this branch if you're happy for the PR to be open 35 days. 😂
Originally mentioned in #1054 (comment) - fixes #1064